Attached Paper In-person November Annual Meeting 2026

The Future of the Past: Evaluating Large Language Model Reliability in Early Christian Historical Studies

Abstract for Online Program Book (maximum 150 words)

Large Language Models (LLMs) are increasingly used for historical and theological inquiry, yet their reliability in specialized scholarly domains remains unexamined. This paper presents a systematic empirical evaluation of LLM accuracy in early Christian studies, using two fourth-century figures as case studies: Macrina the Younger (c. 327-379 CE) and Olympias of Constantinople (c. 368-408 CE). These figures were selected to probe LLM behavior across axes of scholarly versus popular reception, source type, and gender representation. Using a structured benchmarking methodology - testing biographical accuracy, chronological precision, theological positioning, and source-critical reasoning across multiple models - we aim to identify consistent failure patterns, including factual conflation, hallucination, and what we term association collapse: the systematic narration of women's significance through male contemporaries. We conclude with practical guidance for educators on integrating critical AI literacy into religious studies pedagogy and a replicable framework for evaluating LLMs in other historical and theological contexts.