Characterizing initial human-AI proof formalization workflows

K.M.C. and S.F. contributed equally to this work. C.F., V.C., and I.S. contributed equally as senior authors. Correspondence: katiec@mit.edu, simon.frieder@wolfson.ox.ac.uk.

arXiv:2606.04273 PDF Explore survey data

Abstract

For centuries, human mathematicians have written proofs to substantiate their mathematical arguments, yet, the ability to automatically verify the validity of proofs has long been a challenge. Advances in AI systems’ ability to generate code and engage in increasingly high-level mathematical reasoning promise to transform the people’s ability to formalize and thereby verify proofs and even help with proof discovery itself. While many works focus on benchmarking the current frontier, we instead study how people construct their workflows with AI tools, even when such AI tools may be imperfect. We conduct a mixed-methods analysis into the initial impact of AI on people’s formalization workflows: what people claim they want, what they see as the barriers to those visions, and how they actually use and adapt AI in practice. A qualitative survey shows that preferences are diverse, but we observe a general desire for AI assistance in formalization that preserves high-level human control over the proof discovery process. To assess how people actually engage with AI for formalization under such limitations, we conduct a controlled user study in which participants formalize informal math problems and their proofs, with and without AI, across a range of mathematical problems at varying levels of difficulty and domains. Despite limitations of the tools at the time for autoformalization, participants tend to attain higher formalization accuracy when allowed access to AI tools than when formalizing on their own, with most participants flexibly choosing to use multiple different AI tools. Taken together, our work sheds light on the early stages of AI integration into formalization workflows, involving an intimate interplay of human and AI engagement.

Keywords Human-AI interaction · Theorem proving · Formalization · Human-AI workflows

Cite this work

@misc{collins2026characterizing,
  title         = {Characterizing initial human-AI proof formalization workflows},
  author        = {Collins, Katherine M. and Frieder, Simon and Bayer, Jonas and
                   Loader, Jacob and Lim, Jeck and Song, Peiyang and Zaiser, Fabian and
                   Zhou, Lexin and Li, Shanda and Looi, Sam and Tenenbaum, Joshua B. and
                   Bhatt, Umang and Weller, Adrian and Hernandez-Orallo, Jose and
                   Freer, Cameron E. and Chen, Valerie and Sucholutsky, Ilia},
  year          = {2026},
  eprint        = {2606.04273},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI}
}

Survey data explorer

User study data

Forthcoming!

User study problems and solutions will be made available upon full publication!