By: EnagoNearly 1 in 5 AI-Generated References Are Fake: How generative AI is polluting the scholarly record
Generative AI (Gen-AI) tools such as ChatGPT are rapidly becoming embedded in scholarly workflows; assisting researchers with literature discovery, summarization, and even reference list generation. While these systems offer undeniable efficiency gains, their use for reference and citation generation carries a substantial and underappreciated risk: hallucinated references.
What the Evidence Shows: Hallucinations at Scale
A 2025 study published in PubMed Central (PMC) provides one of the most comprehensive evaluations to date of AI-generated references. The authors systematically assessed citations produced by GPT-based systems and reported alarming results:
- 19.9% of AI-generated references were completely fabricated, with no traceable existence in the scholarly record.
- Among the remaining citations, 45.4% contained serious bibliographic errors, including incorrect author names, journal titles, publication years, volumes, or DOIs.
In effect, fewer than one in three references generated by the model were fully accurate and verifiable—a failure rate incompatible with scholarly standards. Crucially, the study also demonstrated that these errors were not random.
Another PMC study on identifying relevant nephrology papers using ChatGPT revealed that:
- 68% of the links in the references were incorrect
- 31% were fabricated and 7% of citations were incomplete references.
Fabricated citations often appeared plausible, mimicking real journals, legitimate-sounding article titles, and familiar author naming patterns. This makes hallucinations particularly difficult to detect without deliberate verification, especially for early-career researchers or interdisciplinary authors working outside their core domain.
A Governance Breakdown: A publisher’s credibility on the line
The theoretical concern about hallucinated citations has, in 2025, manifested in a high-profile scholarly publishing failure. The recent Springer Nature textbook retraction is often described as an “AI citation scandal.” But framing it this way obscures the more uncomfortable truth: this was a governance breakdown, not a technological anomaly. Independent checks of the machine learning textbook Mastering Machine Learning: From Basics to Advanced discovered that two-thirds of the sampled citations either did not exist or were materially inaccurate. Several researchers listed in the references confirmed they had never authored the works attributed to them—classic markers of AI-style fabrication.
Following an integrity review, Springer Nature retracted the book and removed it from catalogs after recognizing that 25 out of 46 listed references could not be verified. What makes this case particularly instructive is not the presence of AI-style hallucinations—but the fact that they passed through multiple institutional checkpoints: - Author submission
- Editorial assessment
- Peer review
- Production and publication
This is not an isolated anecdote. Similar problems have reportedly occurred (and are unfortunately expected to rise up in the future) in other academic books purporting to cover social, ethical, and legal aspects of AI, where unverifiable citations have sparked publisher investigations.
These cases emphasize that hallucinated references are no longer restricted to early-career papers or student submissions: they have infiltrated mainstream academic publishing, even at major multinational publishers.
Why Citation Hallucinations Expose Structural Weaknesses
The problem is not that generative models occasionally fabricate references. This behavior is well documented and technically predictable. The real issue is that academic systems continue to operate as if AI outputs are epistemically equivalent to human-authored content, without adapting policies, checks, or responsibilities accordingly.
When reference verification is implicitly delegated to authors under tight timelines, or assumed to be handled “somewhere” in the editorial process, hallucinations can enter the scholarly record unnoticed. Once published, they gain legitimacy through indexing, citation chaining, and reuse—making later correction far more difficult.
In this sense, fabricated citations represent a silent integrity risk: they do not always trigger plagiarism detectors, statistical checks, or ethical red flags, yet they directly undermine reproducibility and trust.
Why This Matters for Research Integrity
References are not cosmetic elements of a manuscript. They form the scholarly infrastructure. They enable:
- Verification of claims
- Reproducibility of methods
- Proper attribution of intellectual labor
- Cumulative knowledge building
When fabricated or corrupted references perpetuate the scholarly record, they threaten each of these functions. Worse, they can persist undetected, being recited, indexed, and propagated by downstream authors, databases, and meta-analyses.
This creates a feedback loop where false citations gain apparent legitimacy over time, eroding trust not only in individual papers, but in the reliability of the academic corpus itself.
Moving Forward
Responsible AI adoption in research does not require abandoning generative tools—but it does demand clear boundaries, verification workflows, and institutional guidance. Organizations such as COPE, STM, and major journal publishers now emphasize that authors remain fully accountable for the accuracy of references, regardless of AI involvement.
Best-practice safeguards increasingly recommended by publishers, integrity bodies, and AI governance frameworks include:
- Prohibiting unverified AI-generated references in manuscripts
- Requiring authors to manually validate each and every citation produced with AI assistance
- Encouraging the use of AI only for reference organization or formatting, not generation
- Training researchers to recognize hallucination patterns
- Updating author guidelines and peer-review checklists to explicitly address AI-assisted citations
As AI becomes embedded across the research lifecycle, citation hallucination represents one of the clearest stress tests for responsible AI governance. It exposes a fundamental tension between speed and scholarly rigor and emphasizes why human oversight is not optional.
Initiatives like Enago’s Responsible AI movement signal a shift from reactive responses to proactive stewardship: one that equips researchers to use AI thoughtfully, critically, and ethically.
Responsible AI in academia is not about rejecting innovation. It is about ensuring that efficiency never comes at the cost of truth and that progress is built on a foundation the scholarly community can rely upon. If left unchecked, AI-generated reference pollution risks weakening the very foundation of academic trust. If addressed responsibly, however, it can become a catalyst for better workflows, clearer policies, and more resilient research practices.
Similar Articles
Load more