New AI detection tool measures how “surprising” word choices are

The system rarely flags human writing as AI generated, but is “rarely” good enough for use in schools?
Adobe Stock / Freethink / Jacob Hege

A new AI detection tool is reportedly far less likely to falsely flag original human writing as being AI generated — but is “less likely” good enough for use in schools?

The challenge: It’s relatively easy these days for teachers to figure out if a student is guilty of plagiarism — they can just drop a sentence or two from a suspicious essay into Google to see if the text had been pulled from somewhere on the internet.

However, it’s far harder for them to figure out if the student “outsourced” their writing to a large language model (LLM), like ChatGPT — because these AIs generate brand-new content on demand, the text they produce isn’t going to show up in a web search.

“These tools sometimes suggest that human-written content was generated by AI.”

OpenAI

Some groups have developed AI detection tools they claim can tell whether a human or an AI wrote something, but ChatGPT developer OpenAI says they aren’t reliable enough, especially given that a false accusation of cheating could have lasting consequences for a student.

“One of our key findings was that these tools sometimes suggest that human-written content was generated by AI,” the company wrote, adding that a detector it trained itself gave an AI credit for writing Shakespeare and the Declaration of Independence.

What’s new? A team led by researchers at the University of Maryland (UM) has now developed a new AI detection tool, called Binoculars, that is says accurately identified more than 90% of writing samples that were AI generated.

It also had a false-positive rate of just 0.01%, meaning for every 10,000 samples it flagged as being written by AI, only one was actually written by a person. 

Over the course of an academic year, it would incorrectly flag hundreds of student essays as AI creations.

For comparison, software company Turnitin’s AI detection tool — which was previously used by Vanderbilt, Michigan State, and several other major universities — has a false-positive rate of 1%, meaning one out of every 100 accusations of cheating is baseless. 

Vanderbilt stopped using the tool because this false positive rate was high enough that, over the course of an academic year, it would incorrectly flag hundreds of student essays as AI creations.

How it works: Binoculars looks at a piece of writing through two “lenses.” 

The first in an “Observer” LLM. It’s trained to measure “perplexity,” or how unpredictable a text is. LLMs are trained on vast amounts of published material, and they generate text by predicting what word is most likely to come next in a sentence, so the text they write tends to have lower perplexity scores than human-written content.

The second is a “Performer” LLM. It predicts what the next word should be at every point in the text, based on the words that came before it — essentially doing what an AI like ChatGPT would do. The Observer AI then measures the perplexity of the Performer’s choices. 

If there’s little difference between the two scores, Binoculars predicts that the text was likely written by AI.

Schools might be hesitant to use an AI detection tool that could deliver any false positives.

Looking ahead: Binoculars worked on a variety of sample types, including news articles and student essays, and on text generated by several AIs, including OpenAI’s ChatGPT and Meta’s LLaMA-2-7B, which could make it more useful than other, more narrow AI detection tools.

The research still needs to be peer reviewed, but even if it holds up under scrutiny, schools might be hesitant to use it due to the risk of any false positives, even if that risk is far lower than with currently available AI detection tools. There’s also a question of how long such a tool will be effective, if AI models could be tuned to be more creative to evade this kind of checker.

Binoculars researcher Abhimanyu Hans told Business Insider his own team is “conflicted” about whether their system should be used by schools, but they do believe it could be valuable for other applications, such as detecting AI-written content on websites and social media platforms.

As for where that leaves teachers, their only option may be to rework their curriculums to accommodate LLMs, rather than trying to punish kids for using the powerful new tools.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Related
DeepMind’s AI could accelerate drug discovery
A new study suggests that AlphaFold, DeepMind’s AI tool for predicting protein structures, could be useful for drug discovery after all.
Einstein’s 7 rules for a better life
His fame extended far beyond the field of physics, as Einstein played major roles in politics, media, and even the everyday affairs of life.
“Resilience”: How a genocide scholar faces history’s darkest moments
Genocide historian Omer Bartov says studying his particularly challenging subject has made him more mentally resilient.
10 must-see technologies from CES 2024
From super-hyped AI assistants to apps that translate babies’ cries, CES 2024 has given us a glimpse at the tech of tomorrow, today.
Data poisoning: how artists are sabotaging AI to take revenge on image generators
Artists unhappy with their work being used by generative AI have are using “data poisoning” to mess with the algorithm.
Up Next
the Microsoft Copilot logo on a black and yellow background
Subscribe to Freethink for more great stories