Gender bias in Wikipedia. Analysing behaviour linked to editing and the collaborative knowledge creation process.

Emily McPherson College Library, Russell St., circa 1960s. [Unspash](
Emily McPherson College Library, Russell St., circa 1960s. Unspash

The research project has been carried through three stages: initial exploratory research supported by “La Caixa” Foundation (RP802), a doctoral thesis carried out with a UOC grant, and an RDI project funded by the Spanish Ministry of Science and Innovation (PID2020-116936RA-I00). This IN3 blog post takes a look at these three stages.

Wikipedia, one of the internet’s most popular sites and available in over 300 languages, suffers from a gender bias that it has been unable to fix. The UOC has been studying various aspects of this issue to gain a better understanding and to provide possible solutions to help make this digital encyclopaedia a more diverse place.

Initial research: Where Are the Women in Wikipedia?

The interdisciplinary project “WAWW – Where Are the Women in Wikipedia?” (2018), led by Julio Meneses from the GenTIC research group, took an initial look at the gender gap and showed that only 11.3% of editors in the Spanish-language Wikipedia site were women, as can be seen in this poster. There are many studies on the gender gap in the English-language Wikipedia, the one with the most content, but fewer studies have focused specifically on the Spanish-language Wikipedia.

Wikimujeres (Wikipedia)
Wikimujeres (Wikipedia)

Qualitative results from this research were presented in an article about the voices of female editors of the Spanish-language Wikipedia published in El Profesional de la Información. Among other matters, this article explores the strategies applied to ensure that women remain active in a male-dominated online environment. These include, among others, the creation of edit-a-thons for women (“Wikiquedadas“) or participation in smaller communities. Furthermore, the article debunks the common misconception that one of the reasons for this gap is a lack of digital skills. Rather, the female editors who took part in our study agreed that the main factor affecting the extent of their contributions to the online encyclopaedia was the time available to them, which is reduced by work and family obligations.

This preliminary research also resulted in an article published in PLOS One where, using a quantitative analysis, we were able to compare the differences in editing patterns between men and women, as well as make a few recommendations to ensure that female editors remain active after their initial Wikipedia experience. The blog of the UOC’s Faculty of Computer Science, Multimedia, and Telecommunications provided instructions to recreate the analysis of the data taken from a copy of the Wikipedia database through four entries published by Julià Minguillón, a member of the UOC’s Faculty of Computer Science, Multimedia and Telecommunications.

PhD thesis: Wikipedia and gender

While the WAWW project analysed the role of female editors, the PhD thesis on Wikipedia and gender currently being developed by David Ramírez-Ordóñez shifts the focus to look deeper into this complex problem and analyse Wikipedia content. This stage of the research project focuses on the process of generating biographies, with the aim of describing the gender gap in terms of the creation of new articles. A poster has been published with the research plan.

This thesis focuses on the biographies of both male and female scientists in the English Wikipedia with the aim of finding patterns that can serve to identify the gender gap in other languages, something that will then be addressed in the third stage of the research project. One of the challenges posed by this thesis is that, as well as analysing the biographies that were finally published, it also tracks those that were deleted and are therefore unavailable in Wikipedia. It analyses the decision-making process carried out by the Wikipedia community, addressing such an important matter as how consensus on the relevance of knowledge is reached and how the decision makers agree on which content should be excluded from Wikipedia after discussing it and agreeing not to make it visible.

In order to analyse the content that has been rejected for inclusion on Wikipedia, we used tools to recover the traces of the discarded biographies, and we then used them to obtain clues about content that is no longer available. We even used digital preservation tools to access deleted articles. We did this by working along two lines.

First, we analysed the deliberations carried out by Wikipedians to decide what constitutes valid knowledge and which content is not sufficiently notable to be included in Wikipedia. We did this by analysing the discourse of the biographies included in the “Deletion queries” category, looking at the arguments made in discussions and at the votes on the biographies in issue that had led to a decision. The tools used in this process were the Wikipedia API (for example, for Judy Endow‘s biography) and the Internet Archive‘s Wayback Machine (for example, for Pamela Jones‘ biography).

And, second, we studied the evolution of content tagged as “requiring improvement”. We compared the biographies of male and female STEM (science, technology, engineering and mathematics) scientists that had been flagged for improvement against unflagged ones. With this comparison, we were able to analyse what sources of information are being used as the factual basis of biographies, as such sources are directly related to Wikipedia’s basic content policies for deciding whether or not an article is notable. These include a neutral point of view, verifiability, and no original research.

Project for the Spanish Ministry of Science and Innovation: Women & Wikipedia

In the Women & Wikipedia research project led by Núria Ferran-Ferrer, we set the question of how Wikipedia can be effectively accessed, edited and transformed by women, particularly in view of the main concern that Wikipedia is one of the most used information and learning resources and that it further consolidates the gender gap.

Viquidones UPF (Wikipedia)
Viquidones UPF (Wikipedia)

The project will begin by describing the presence or absence of female editors in Spain’s various Wikipedias: the Basque, Catalan, Galician and Spanish sites. It will analyse aspects including, among other indicators, links, retention or persistence, and abandonment. The project, similarly to what has been done in previous research studies with the English Wikipedia site but in relation to the sites listed above, will also analyse the decision-making processes that lead to the decision that specific content is socially relevant and can therefore be considered to constitute knowledge and worthy of inclusion in Wikipedia, or that it is not relevant and will therefore not be visible in the Wikipedias. The foundations of the relevance criterion in relation to the sources of information on which Wikipedia articles are based will also be studied, taking account of the fact that women and the content relating to them or their interests may encounter more difficulties when it comes to publication in the public sphere.

This project is being carried out with the active involvement of many groups and institutions including the Wikimedia Foundation, Wikimujeres, and Viquidones UPF, which are two groups of Wikipedians that support women’s involvement and visibility, Amical Wikimedia, which is the Wikimedia section for Catalonia, the Public Libraries of Barcelona, and the District Network of Public Libraries of Bogotá (Biblored).

Further reading

This article was originally published on IN3 Blog on April 7, 2022.
David Ramírez-Ordóñez
David Ramírez-Ordóñez
Investigador doctoral

Investigador predoctoral del programa de Societat de la Informació i el Coneixement de la UOC.

Julio Meneses
Julio Meneses
Full Professor

Professor de metodologia de la investigació, director de Learning Analytics de l’eLearning Innovation Center i investigador de l’Internet Interdisciplinary Institute de la Universitat Oberta de Catalunya.