Generating Stub Articles about Women in Religion: An Experiment in Retrieval Augmented Generation and Fine-Tuning LLMs

Papers Session: AI Experiments: Balancing Innovation, Bias, and Agency in AI-Driven Religious Studies

Description for Program Unit Review (maximum 1000 words)

This paper describes an experiment to generate stub articles about women religious leaders using a purpose-built artificial intelligence system as a means to address gender imbalances on Wikipedia.

The Women in Religion User Group is an officially recognized Wikimedia Movement Affiliate that “seeks to create, update, and improve Wikipedia articles pertaining to the lives of cis and transgender women scholars, activists, and practitioners in the world's religious, spiritual, and wisdom traditions.” (Women in Religion 2025) The Women in Religion User Group has its roots in the Women's Caucus of the American Academy of Religion and the Society of Biblical Literature. Since its start in 2018, the user group has held dozens of edit-a-thons to train new editors and has expanded internationally with affiliate groups in Africa and Australia.

Introducing new editors to Wikipedia can be challenging, particularly when working in areas where standards for notability are contested. A stub is a brief Wikipedia article that meets the minimal threshold for notability. By creating stubs about women who do not have articles in Wikipedia, we provide a foothold for new editors. Since disagreements about notability can result in the deletion of articles, we aim to encourage new editors by providing them with stubs about women religious leaders to which they can easily contribute new content.

The creation of these stubs still involves significant research and writing, however, and our small base of experienced editors cannot generate them as quickly as our community would like. The roots of this experiment trace back to 2023, when a subgroup of the editors involved in Women in Religion began to experiment with artificial intelligence tools to explore whether they could redress the gender imbalances on Wikipedia.

Like so many early users of generative AI, the group quickly discovered the problem of confabulations (now the preferred term for so-called AI “hallucinations”); when the LLM did not have information about an individual, it tended to make up plausible but false facts about her. Rather than give up the attempt, we began to explore ways to make LLMs more reliable and less prone to error. We wound up building a system both to improve the content of our stubs and to improve their form.

In the early stages of the project, we explored the use of retrieval augmented generation (RAG) to improve the veracity of the stubs that the LLM generated. RAG refers to systems that incorporate external knowledge bases into large language models to provide them with contextual information that they did not learn during pre-training. To develop a RAG system, we tokenized articles by and about figures of interest, added them to vector databases (Pinecone and Weaviate) and connected them to GPT 4 using frameworks such as LangChain and LlamaIndex.

In the current phase of the project, we are fine-tuning an open-source large language model to improve its ability to create Wikipedia stubs. Fine-tuning refers to the process of adjusting a pre-trained model for more specific purposes. In the case of large language models, technologies for fine-tuning have emerged that require only a few hundred data points, making them accessible to non-professional users who wish to customize their models to perform particular tasks. Our goal in fine-tuning a model is to train it to produce stubs with the proper form, including infoboxes and categories. Drawing from the labor of the editors in the Women in Religion project to identify relevant stubs about women religious leaders, we curated a synthetic dataset by asking an LLM to produce an appropriate prompt for each of the stub articles we identified. This dataset is now available under an open-source license on HuggingFace for others to use.

We conclude our talk by discussing how effective these techniques have been in practice while also raising two key ethical questions about which we hope to receive comment and feedback.

The first question concerns the ethics of releasing our project in open source. While we are designing this technological system to advance gender justice online, we recognize that the same tools could also be used for different purposes. Some of these purposes would be innocuous, such as generating thousands of stubs about semi-professional football players or musicians; others may be highly problematic, such as using the tool to spam Wikipedia with stubs about commercial products. We intend to mitigate against such problematic usages by requiring users of our fine-tuned model to agree to a code of conduct on HuggingFace, though we recognize that this will not prevent malign intent.

A related ethical issue arises from the effect of using AI tools within the Wikimedia ecosystem. The Wikipedia community is still developing norms and best practices for the use of AI on its platform. (Wikipedia contributors 2025) In our case, we do not plan to deploy articles directly to the so-called “article space” on Wikipedia, but only to the “draft” namespace, where human editors will review and fact-check them before publishing them. We also aim to be transparent about edits made by AI. That said, we are concerned that by gaining efficiency in the generation of stub articles we may inadvertently be increasing the burden on other editors. As Joshua Ashkinaze et al. remark, we may find that AI tools are “increasing moderator burden if moderators need to constantly check that the LLM changes are not hallucinations.” (Ashkinaze et al. 2024, p. 20)

As a work-in-progress, this paper will report on the state of the project in fall 2025. We anticipate a beta version of the fine-turned model and RAG system will be available for others to use, subject to agreement with our code of conduct document on HuggingFace.

Works Cited

Ashkinaze, Joshua, Ruijia Guan, Laura Kurek, Eytan Adar, Ceren Budak, and Eric Gilbert. 2024. "Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms." arXiv preprint arXiv:2407.04183. https://arxiv.org/abs/2407.04183

Wikipedia contributors. 2025. "Wikipedia: Artificial Intelligence." Wikipedia, The Free Encyclopedia. Last modified March 9, 2025. https://en.wikipedia.org/wiki/Wikipedia:Artificial_intelligence.

Women in Religion. 2025. "Women in Religion User Group." Last modified March 9, 2025. https://meta.wikimedia.org/wiki/Women_in_Religion_User_Group.

Abstract for Online Program Book (maximum 150 words)

This paper describes an experiment to generate stub articles about women religious leaders using a purpose-built artificial intelligence system as a means to address gender imbalances on Wikipedia. The Women in Religion User Group is an officially recognized Wikimedia Movement Affiliate that “seeks to create, update, and improve Wikipedia articles pertaining to the lives of cis and transgender women scholars, activists, and practitioners in the world's religious, spiritual, and wisdom traditions.” (Women in Religion 2025) In the early stages of the project, we explored the use of retrieval augmented generation (RAG) to improve the veracity of the stubs that the LLM generated. In the current phase of the project, we are fine-tuning an open-source large language model to improve its ability to create Wikipedia stubs. After reviewing these techniques, we discuss their effectiveness while also raising ethical questions about releasing our project in open source.

Authors

Clifford Anderson, Yale University

clifford.anderson@yale…