Wikipedia and gender: The deleted, the marked, and the unpolluted biographies

Abstract

Wikipedia is the self named free encyclopedia, available in more than 300 languages and one of the most popular websites on the Internet. Despite its mission of collecting the sum of all knowledge, one of Wikipedia’s struggles is its gender bias. In this paper we present a proposal of the corpus for analysis of the generation of biographies, written in the English Wikipedia, in order to identify the gender bias in the creation of new content to reflect the new valid knowledge of all human beings. First we identify a mechanism to access a corpus of deleted biographies and those which have been accepted into the category Articles for Deletion, where editors vote to keep, merge, redirect or delete content in an online debate. Then we access a different set of data, a second corpus from the category Scientist by field in which we have chosen biographies marked as content to be improved due to its lack of bibliographic references and those which have never been marked for improvement. To do so, we focused on the area of science, in the first case, with the category Articles for Deletion we selected scientists, and in the second case, with the category Scientists by field we selected STEM scientists, in order to compare how gender affects the development of content in Wikipedia. Lastly we propose a path to understanding the generation of the gender gap in the collaborative creation of shared content, this entails a close up look at the policies and guidelines of the digital encyclopedia, such as notability and reliable sources, created by the community of editors to shape the type of content accepted as valid knowledge.

Publication
Proceedings of 9th International Wiki Workshop 2022 at The Web Conference 2022 (Wiki Workshop 2022)

Full text PDF

Citation

Ramírez-Ordóñez, D., Ferran-Ferrer, N., & Meneses, J. (2022). Wikipedia and gender: The deleted, the marked, and the unpolluted biographies. Proceedings of 9th International Wiki Workshop 2022 at The Web Conference 2022 (Wiki Workshop 2022). https://wikiworkshop.org/2022/
David Ramírez-Ordóñez
David Ramírez-Ordóñez
Investigador doctoral

Investigador predoctoral del programa de Societat de la Informació i el Coneixement de la UOC.

Julio Meneses
Julio Meneses
Professor agregat

Professor de metodologia de la investigació, director de Learning Analytics de l’eLearning Innovation Center i investigador de l’Internet Interdisciplinary Institute de la Universitat Oberta de Catalunya.

Related