Articles on: Info About Originality

Most Common Reasons for False Positives With Originality

If you know that an article was produced by a human but Originality gives it a high AI score, this is a false positive

False positives happen in about 1.56% of cases of AI detector false positives

This article will explain the most common false positives and how to avoid them

Lowering Your False Positive Rate

You can lower your rate of false positives by:

Making sure that no tools were used to generate, edit, or plan your content. Every tool is powered by AI and will increase our AI score. Even if you rewrite content, or planned the content yourself, or an AI only did the editing... it will still raise our AI score
Avoiding software like Grammarly, ChatGPT, Quillbot, Microsoft Word Editor, or other tools
Scanning as much of the content as possible (instead of only scanning a paragraph, or the intro, or conclusion, or etc)
Avoiding formulaic content. (intros, conclusions, recipes, works cited pages)
Using our Chrome extension to "watch a writer write". It allows you to see the edit history of a Google Doc and run scans at each point along the edit history. You can see how the AI score changes over time

The most common false positives are caused by:

Tool usage
Short content
Formulaic content
Public domain works
Academic content

Let's take a look at each of these

Tool Usage

By far our most common cause of high positive scores is when people use some sort of tool to help with their content. It might be Grammarly, ChatGPT, Quillbot, or others

This tool might be used for planning, editing, sentence shortening, or rewriting... (in other words, not full content production)

And it will still raise our AI score.

Using any tool whether it's Grammarly, ChatGPT, Quillbot, or others will raise your AI score. Unfortunately, all of these tools (and others) are powered by AI. Our software is sensitive enough to detect the usage of tools or software in your content and will raise the AI score

In general: if a tool interacted with your content in any way, it will raise the AI score. This isn't so much a false positive as us finding AI influence in your scan

Using any editing, writing, or even content planning tool can raise the AI score even if you rewrite the content

But a high AI score doesn't necessarily mean that AI produced all of the content, just that AI was used in some way to produce the content

Remember that our score is a confidence score. So an AI score of 90% doesn't mean that AI produced 90% of the content. It means that our tool is 90% confident that AI was used in some part to produce the content

This could mean editing, rewriting, sentence shortening, paraphrasing, or perhaps even content planning

The first step we always take with these high scores is to reduce the usage of AI. I recommend scanning a version of your document before the tool (powered by AI) was used.

This means scanning your content before Grammarly made edits, or before ChatGPT shortened the sentences, or before some other tool paraphrased... etc.

Even Microsoft Word Editor has become powerful enough in recent years to trigger our AI detector

If a tool was used, it will raise your AI score. But you can lower the score to a natural range by getting rid of the tool's influence

If you didn't use a tool and still got a high score, there might be some other causes:

Short Content - Over 100 Words Recommended

Scanned content should always be at least 100 words, and longer content will often be more accurate.

To test this out, I wrote an intro to a blog post myself. It was 50 words long, and the AI score came out as 79% AI

What happened here?

For one, the content was too short. Our AI detection software didn't have enough content to base its judgment off of, so it rated my content as higher than it should be.

We recommend scanning the entire piece of content that you have, not just parts of it

You are more likely to receive false positives by only scanning part of the entire piece of content: a single paragraph, the intro, the conclusion, half of the paper... etc.

We can fix a lot of false positives by scanning the entire piece of content instead of a snippet. This just gives our tool more context and more content to go off of.

Formulaic Content

Content that follows a formula often gets flagged as high-probability of AI. A "formulaic" piece of content could be an intro, a conclusion, a 5-paragraph essay, an article filled with statistics, or a scientific journal.

Not all formulaic content will be flagged for AI, but all formulaic content is more likely to get a higher AI score.

Here's an example scan of formulaic content:

You can see that this content was 107 words, so it was short, but not too short. So why am I getting a 69% AI score if I wrote this myself, just now?

It's a conclusion to an article. In general, these are pretty formulaic: You summarize everything before, give the reader more options and places to click, and finish up with any final calls to action.

I wrote the conclusion myself. But since it's both formulaic and relatively short, it is getting flagged with a high probability of AI-generated content.

Scanning Non-Typical Content - Such As Old English & Classic Literature

Another common false positive we see if scanning public domain content such as books, journals, or old content.

The reason is simple: Our AI was trained on modern writing. It's not that we think that Jane Austen is an AI, but that our AI is unfamiliar with older writing styles

It is possible that you will get a false positive by scanning old public domain works or content that has a similar style. This may include the Constitution, the King James Bible, or other content that is written in older English

It's worth noting that not all public domain works get false positives: I also scanned parts of The Adventures of Sherlock Holmes and The Tragedy of Romeo and Juliet. Both scored 100% original to humans.

Your mileage with older works may vary. We recommend only using Originality.ai for the kind of content that you will be publishing: modern English.

Academic Content

Our tool is not built for academic content. We feel so strongly about this that we put this on our front page:

"Not for students"

AI detection tools have a lower accuracy with academic articles because academic articles often have all of the problems listed above:

Academic writing is competitive, so there is additional pressure to use tools like Grammarly to compete
Academic writing is often formulaic. Intros, conclusions, bibliographies, and works cited all follow a set formula. Bibliographies and works cited pages in particular can often be difficult to distinguish between humans and AI
Academic writing is non-typical for our tool. Originality's AI is mostly trained on web publishing and can be less accurate for academic writing

Using Originality.ai For Academics

If you do want to use our tool for academics and understand the risks, here's what we recommend:

Ask students to not use any tools for the most accurate results
Be aware that if students use any tool (including Grammarly, which is provided by many universities), it will increase our AI score
Scan several works from the same student in order to establish a "baseline" for this student. Whether high or low, this will allow you to detect any change from the regular scoring of this student
Scan several works of your own in order to establish a baseline for yourself
Remove works cited pages and bibliographies from your scan as these are often formulaic
Manually check sources. AI finds it very difficult to correctly cite sources, and will often make things up. If a student has a high AI score but sources check out, then there is a high probability that the student wrote the paper
Please grade students with grace. It is dangerous to affect a student's entire life based on one tool from the internet. Be considerate and understand that our tool is not perfect

Lowering Your False Positive Rate

You can lower your rate of false positives by:

Making sure that no tools were used to generate, edit, or plan your content. Every tool is powered by AI and will increase our AI score. Even if you rewrite content, or planned the content yourself, or an AI only did the editing... it will still raise our AI score

Avoiding software like Grammarly, ChatGPT, Quillbot, Microsoft Word Editor, or other tools
Scanning as much of the content as possible (instead of only scanning a paragraph, or the intro, or conclusion, or etc)
Avoiding formulaic content. (intros, conclusions, recipes, works cited pages)
Use our Chrome extension to "watch a writer write". It allows you to see the edit history of a Google Doc and run scans at each point along the edit history. You can see how the AI score changes over time

We recommend to scan content that is as long as possible. This helps solve the short content and the formulaic content problems since longer content will always be more unique than shorter content.

You can expect a higher AI score if you scan formulaic content such as an article with many statistics or a scientific journal. Public domain works or other work that an AI has been trained on will often give a higher score than usual.

The best defense against a false positive is awareness of what causes them, scanning several works instead of a single piece of work, and scanning longer content.

We recommend scanning 3-5 works of at least 100 words from the same author for best results.

Updated on: 22/12/2023

Was this article helpful?

Thank you!