Published: June 8, 2026

When Your Writing Looks Like AI Because You Learned English Second

Item: Why Non-Native English Writing Gets Flagged as AI — And What to Do
Rating: 5
Author: Enago

When Your Writing Looks Like AI Because You Learned English Second

A 2023 Stanford study put 91 TOEFL essays written by non-native English students through seven leading AI detection tools. The detectors flagged 61% of the human-written essays as AI-generated. Almost all of them.

This is the equity problem at the heart of AI detection in academic publishing. For decades, non-native English researchers have used grammar checkers, professional editing services, and language tools to compete on equal footing with native speakers. Generative AI was the natural next step. But the same statistical patterns that make AI-assisted writing useful, including simpler vocabulary and predictable sentence structure, also describe how someone writing in a second language tends to write. Detection tools cannot tell the two apart.

The result is a fairness crisis that disproportionately punishes the researchers who already faced the steepest barriers to publication.

What the Stanford Finding Tells us

The original study by Liang, Yuksekgonul, Mao, Wu, and Zou (2023) tested seven publicly available GPT detectors on TOEFL essays and US student essays. The TOEFL essays were misclassified as AI-generated at a 61.3% average rate. On about 19.8% of the TOEFL essays, all seven detectors agreed the writing was AI-generated, even though no AI was involved. By contrast, native English student essays were correctly classified almost every time.

The mechanism is straightforward. Non-native English writers tend to produce text with lower lexical diversity, simpler syntactic patterns, and more predictable word choices. AI-generated text also has these properties because language models optimize toward statistically likely outputs. Detectors trained to spot AI patterns end up flagging anything that looks predictable, regardless of who wrote it.

The Stanford team made the implication explicit: the detectors penalize non-native writers for the exact features that come from learning English as a second language.

How This Plays out in Academic Publishing

A dermatology literature gives a useful real-world test case. A 2025 study published in JAAD International by Wang et al. analyzed Research Letters from 2020 (before ChatGPT) and 2024 (after ChatGPT). They found that letters from non-native English authors in 2024 were flagged at a higher rate than letters from US authors. The 2020 letters showed no major gap.

The same study ran a second test. The team took human-written research letters and used ChatGPT to polish them for readability and flow, with no factual or substantive changes. Before polishing, 97% to 100% of the letters were classified as human-written. After polishing, 75% to 85% were flagged as AI-generated, with 15% to 25% labeled at high confidence. The data showed the same pattern with a different detection tool.

The implication for non-native English researchers is sharp. Using AI to clean up grammar and flow, the exact use case most journal policies explicitly allow, dramatically raises the chance that the manuscript will trigger an AI detection flag.

The Volume Problem at Scale

A November 2025 large-scale analysis by Liu, He, Zheng, Bu, and Ni examined more than 2 million biomedical papers on PubMed Central from 2021 to 2024. They estimated AI-assisted writing adoption grew approximately 400% in non-English-speaking countries, compared to 183% in English-speaking countries. Adoption was highest among less-established researchers, those at lower-ranked institutions, and those in early career stages.

However, the researchers who benefit most from AI writing assistance are also the researchers most likely to be flagged by detection tools. The same study found a modest narrowing of the publication gap between scientists from English-speaking and non-English-speaking countries when AI adoption was higher. This is the equity gain detection bias threatens to reverse.

The Career Consequences are Real

False AI flags in academic publishing carry consequences beyond a single rejection. Editorial suspicion of AI use triggers desk rejection, integrity review, or in some cases retraction. The August 2025 update to COPE retraction guidelines explicitly includes "undisclosed involvement of artificial intelligence" as grounds for retraction. A non-native English researcher whose manuscript is incorrectly flagged faces the same downstream risk as a researcher who deliberately concealed AI use.

The pattern is already documented in higher education. Australian Catholic University used Turnitin's AI detection tool to accuse approximately 6,000 students of academic misconduct in 2024. About a quarter of the cases were later dismissed after investigation. Any case relying solely on Turnitin's AI detector as evidence was dismissed immediately. The university later abandoned the tool. The damage to students caught in the process took months to undo.

In academic publishing, where reviewer time is scarce and editorial judgment moves quickly, the equivalent process unfolds without a formal misconduct hearing. A manuscript flagged by a detector during initial screening often receives a rejection email with no opportunity to contest the flag.

Why Institutions are Walking Away from AI Detectors

The institutional response has been telling. Vanderbilt University disabled Turnitin's AI detection tool in August 2023, citing both the false positive rate and the documented bias against non-native English speakers. Vanderbilt's calculation: with 75,000 papers submitted to Turnitin in 2022 and a claimed 1% false positive rate, roughly 750 student papers would have been incorrectly labeled.

The Universities of Pittsburgh, Michigan State, Northwestern, and Texas followed with similar decisions. The pattern is consistent. Institutions that examined the data on detector reliability concluded the tools were not accurate enough for high-stakes use, and the bias against non-native English writers was a primary reason cited.

Academic journals have been slower to follow. Many still rely on detection tools as part of integrity screening. Researchers writing in English as a second language carry the cost of this lag.

What Non-native English Researchers Should do now

The fairness problem is real, but the practical defenses are within reach. Five steps reduce the risk of false flags during manuscript submission.

Step 1: Disclose AI use clearly and specifically. A clean disclosure statement that names the tool, the version, and the specific task (language polishing, grammar checking, terminology consistency) reframes any later flag as known and approved use. Detection tools cannot prove a flag is correct, but undisclosed use is what triggers integrity reviews. Disclosed use is harder to penalize. For a structured way to draft a publisher-aligned disclosure, Enago's AI Disclosure Statement Generator walks through the required fields.

Step 2: Keep your drafting trail. Save versions of your manuscript at each stage. A timestamped Word document or a Google Docs version history showing the manuscript evolved over weeks of writing is concrete evidence of human authorship. If a journal raises concerns, this trail answers them.

Step 3: Use professional human editing where it matters. For high-stakes submissions, professional editing by a human editor produces text that does not trigger AI-pattern detectors the same way machine polishing does. Human editing also catches issues automated tools miss. Enago's editing services work specifically with non-native English researchers preparing manuscripts for international journals.

Step 4: Vary sentence structure during your final pass. Detection tools flag predictable, uniform writing. A final read-through that mixes sentence lengths, breaks up parallel constructions, and varies opening words moves the writing away from the patterns that trigger false positives. This is also what good academic prose looks like.

Step 5: If your work is wrongly flagged, contest it directly. Editors are generally aware of the documented bias against non-native English writers. A clear, polite response that names the Stanford and JAAD studies, presents your drafting trail, and notes the disclosure of any AI assistance is often enough to resolve the concern. Detection tool output alone cannot prove AI authorship.

What Journals and Publishers Should do

The fix is not on the researchers alone. Journals and publishers carry responsibility for using detection tools fairly. Three changes would shift the burden in the right direction.

First, treat AI detection tool output as one signal among many. The Vanderbilt analysis and the Stanford study together establish that detection scores cannot be treated as evidence on their own. A flag should trigger a closer look, never an automatic rejection.

Second, publish the false positive rates and bias data for the specific tools the journal uses. Researchers should know what they are being measured against. Most journals do not currently disclose which detection tools they use or how the output factors into editorial decisions.

Third, give flagged authors a chance to respond before any rejection or integrity action. The detection tool is sometimes wrong. The author might have already disclosed AI use in an earlier section the editor missed. A short query before action prevents the most serious unfair outcomes.

The Deeper Problem AI Detection Cannot Solve

The Stanford finding is now more than two years old. The follow-up studies, the institutional responses, and the JAAD data all point in the same direction. AI detection tools cannot reliably distinguish AI-generated text from text written by humans whose first language is not English. This is not a flaw in any single tool. It is a structural problem in the underlying approach.

For non-native English researchers, the practical reality is unchanged from before ChatGPT. Strong scholarship plus careful language work plus transparent disclosure remains the path to publication. The tools have changed. The standard has not. What has changed is the burden of proof. Researchers writing in English as a second language now carry an extra step: showing that their writing came from them, not from a machine that happens to write the way they do.

That extra step is unfair. It is also, for now, the cost of submitting a manuscript in a system that has not yet caught up with what its own detection tools cannot do.