May 13th, 2021
Hey, it’s Ivan – developer here at Datawrapper. This week we’ll look at why the Cebuano and Swedish editions of Wikipedia have the 2nd and 3rd highest number of articles out of all the languages.
Wikipedia is divided by language into separate editions – there are currently 317 in total. The three editions with the most articles at the time of writing are:
English holds the top spot – this is presumably because it’s the largest language by number of speakers, with approximately 1.268 billion speakers. But how can Cebuano, with an estimated 20 million speakers, and Swedish, with 10 million speakers, have the 2nd and 3rd highest number of Wikipedia articles?
It turns out that there is indeed a catch: most articles for Cebuano and Swedish Wikipedias were not written by humans.
Up until 2012, the number of articles in most Wikipedias was growing “organically” – that is, human authors were writing them. Then, in 2012, a bot called Lsjbot was launched. It started writing articles for Swedish and Cebuano Wikipedias with lightning speed. Within a few years, it took these two editions into 2nd and 3rd place.
You might ask: but why specifically Swedish and Cebuano? It turns out that Sverker Johansson, the programmer who created Lsjbot, is Swedish. And Cebuano is the native language of his wife.
Each Wikipedia edition has rules on what articles are accepted and whether bots can write them. The Swedish and Cebuano Wikipedia communities evidently decided to allow submissions from bots, but at some point around 2017 – 2018, this decision was reversed. As can be seen in the chart, the rate of new articles has considerably flattened out.
The articles written by Lsjbot are typically very short and factual and have been criticized for lacking meaningful content. So we’re still some way away from quality Wikipedia content being automatically generated by bots.
There are other bots that write Wikipedia articles, but none of them have been as prolific as Lsjbot.
Getting the data was easy thanks to the Wikipedia Statistics portal, which I can recommend if you want to explore data on Wikipedia. Here is an example URL which shows the monthly count of articles on the English Wikipedia.
To visualize the data, I used a line chart, which is perfect for illustrating trend changes over time. To focus on the story, I de-emphasized the lines of Wikipedias other than English, Cebuano, and Swedish by using a gray color. Finally, I used a highlight range and a text annotation to communicate the Lsjbot launch date.
Maybe next time you’re on Wikipedia, you’ll be reading an article written by a bot! Let me know on Twitter or at firstname.lastname@example.org if you have any questions about the chart or the article. We’ll see you next week!