EnagoBy: Enago

“Looks Cited” Doesn’t Mean “is Cited.” Why 1 in 5 AI References Fail.

“Looks Cited” Doesn’t Mean “is Cited.” Why 1 in 5 AI References Fail.

Generative AI (Gen AI) tools such as ChatGPT are rapidly becoming embedded in scholarly workflows; assisting researchers with literature discovery, summarization, and reference list generation. While these systems offer undeniable efficiency gains, their use for reference and citation generation carries an often overlooked risk: hallucinated references.

For researchers operating under intense publication pressures, this risk is no longer theoretical. It has direct implications for manuscript credibility, peer review outcomes, and long-term research integrity.

What the Evidence Shows: Hallucinations at Scale

A 2025 study published in PubMed Central (PMC) provides one of the most comprehensive evaluations to date of AI-generated references. The authors systematically assessed citations produced by GPT-based systems and reported alarming results:

  • 19.9% of AI-generated references were completely fabricated, with no traceable existence in the scholarly record.
  • Among the remaining citations, 45.4% contained serious bibliographic errors, including incorrect author names, journal titles, publication years, volumes, or DOIs.

In effect, fewer than one in three references generated by the model were fully accurate and verifiable—a failure rate incompatible with scholarly standards. Crucially, the study also demonstrated that these errors were not random. 

Another PMC study on identifying relevant nephrology papers using ChatGPT revealed that:

  • 68% of the links in the references were incorrect
  • 31% were fabricated 
  • 7% of citations were incomplete references

Fabricated citations often appeared plausible, mimicking real journals, legitimate-sounding article titles, and familiar author naming patterns. This makes hallucinations particularly difficult to detect without deliberate verification, especially for early-career researchers or interdisciplinary authors working outside their core domain.

Publisher Credibility on the Line

The theoretical concern about hallucinated citations has, in 2025, manifested in a high-profile scholarly publishing failure. The recent Springer Nature textbook retraction is often described as an “AI citation scandal.” But framing it this way obscures the more uncomfortable truth: this was a governance breakdown, not a technological anomaly. Independent checks of the machine learning textbook Mastering Machine Learning: From Basics to Advanced discovered that two-thirds of the sampled citations either did not exist or were materially inaccurate. Several researchers listed in the references confirmed they had never authored the works attributed to them—classic markers of AI-style fabrication. 

Following an integrity review, Springer Nature retracted the book and removed it from catalogs after recognizing that 25 out of 46 listed references could not be verified. What makes this case particularly instructive is not the presence of AI-style hallucinations—but the fact that they passed through multiple institutional checkpoints:

  • Author submission
  • Editorial assessment
  • Peer review
  • Production and publication 

This is not an isolated anecdote. Similar problems have reportedly occurred (and are unfortunately expected to rise up in the future) in other academic books purporting to cover social, ethical, and legal aspects of AI, where unverifiable citations have sparked publisher investigations. 

These cases emphasize that hallucinated references are no longer restricted to early-career papers or student submissions: they have infiltrated mainstream academic publishing, even at major multinational publishers.

Why Citation Hallucinations Expose Structural Weaknesses

Hallucinations happen because AI systems, like GPT models, don’t have true access to real-time or live databases—they generate text by predicting what would be plausible based on patterns in training data. Therefore, references that sound legitimate can be fabricated, even though they don't exist in any research database.

The problem is not that generative models occasionally fabricate references. This behavior is well documented and technically predictable. The real issue is that academic systems continue to operate as if AI outputs are epistemically equivalent to human-authored content, without adapting policies, checks, or responsibilities accordingly.

In many research workflows, reference accuracy is implicitly assumed to be verified “somewhere” in the process by co-authors, reviewers, or editorial teams; often under severe time constraints. In multi-author collaborations, responsibility for AI-assisted citations can become diffuse, allowing hallucinations to enter manuscripts unnoticed.

Once published, fabricated or corrupted references can rapidly gain legitimacy through indexing, citation chaining, and reuse in secondary literature. This creates a silent integrity risk: citation errors that evade plagiarism detection and statistical checks, yet directly undermine reproducibility, attribution, and trust.

Why This Matters for Your Next Paper

References are not cosmetic elements of a manuscript. They form the scholarly infrastructure. They enable:

  • Verification of claims
  • Reproducibility of methods
  • Proper attribution of intellectual labor
  • Cumulative knowledge building

When fabricated or corrupted references perpetuate the scholarly record, they threaten each of these functions. Worse, they can persist undetected, being recited, indexed, and propagated by downstream authors, databases, and meta-analyses.

This creates a feedback loop where false citations gain apparent legitimacy over time, eroding trust not only in individual papers, but in the reliability of the academic corpus itself.

Potential Solutions

Responsible AI adoption in research does not require abandoning generative tools—but it does demand clear boundaries, verification workflows, and institutional guidance. Organizations such as COPE, STM, and major journal publishers now emphasize that authors remain fully accountable for the accuracy of references, regardless of AI involvement.

Instead of relying on AI to generate full citations, researchers should utilize AI tools for organizing and formatting references that have already been verified. This allows researchers to streamline workflows without risking incorrect or fabricated citations. Some best-practice safeguards increasingly recommended by publishers, integrity bodies, and AI governance frameworks include:

  • Incorporating reference validation checks early in the writing process, using citation verification tools, and involving experienced co-authors in the final verification of citations.
  • Check with your institution or publisher for updated guidelines on AI usage in academic writing. If none exist, work with colleagues and peers to create and adopt internal protocols.
  • Training of researchers and reviewers to recognize hallucination patterns
  • Look into services or third-party tools that focus on AI-driven citation checking.

As AI becomes embedded across the research life cycle, citation hallucination represents one of the clearest stress tests for responsible AI governance. It exposes a fundamental tension between speed and scholarly rigor and emphasizes why human oversight is not optional.

Remember - Responsible AI in academia is not about rejecting innovation. It is about ensuring that efficiency never comes at the cost of truth and that progress is built on a foundation the scholarly community can rely upon. If left unchecked, AI-generated reference pollution risks weakening the very foundation of academic trust. If addressed responsibly, however, it can become a catalyst for better workflows, clearer policies, and more resilient research practices.

For researchers, this moment presents a choice: to use AI passively and absorb its risks, or to engage actively in shaping how AI is governed within scholarly work. Enago’s Responsible AI movement invites researchers to be part of this stewardship—by adopting evidence-based best practices, contributing to community-driven standards, and embedding verification-first thinking into everyday research workflows. By engaging with RUAI, researchers can help ensure that AI strengthens, rather than erodes, the credibility of the academic record—and that innovation in research remains both powerful and trustworthy. Researchers interested in contributing to or learning more about responsible AI practices can connect with the initiative through Enago’s inquiry form, creating a direct channel to explore guidance, resources, and collaboration opportunities tailored to real-world research workflows.