Articles on: Info About Originality

Most Common False Positives With

If you know that an article was produced by a human but Originality gives it a high AI score, this is a false positive

False positives happen in about 1.56% of cases

This article will explain the most common false positives and how to avoid them

Short Content

Scanned content should always be at least 100 words, and longer content will often be more accurate.

To test this out, I wrote an intro to a blog post myself. It was 50 words long, and the AI score came out as 79%!

What happened here?

For one, the content was too short. Our AI detection software didn't have enough content to base its judgment off of, so it rated my content as higher than it should be.

But this content was also formulaic...

Formulaic Content

Content that follows a formula often gets flagged as high-probability of AI. A "formulaic" piece of content could be an intro, a conclusion, a 5-paragraph essay, an article filled with statistics, or a scientific journal.

Not all formulaic content will be flagged for AI, but all formulaic content is more likely to get a higher AI score.

Here's an example scan of formulaic content:

You can see that this content was 107 words, so it was short, but not too short. So why am I getting a 69% AI score if I wrote this myself, just now?

It's a conclusion to an article. In general, these are pretty formulaic: You summarize everything before, give the reader more options and places to click, and finish up with any final calls to action.

I wrote the intro and conclusion myself. But since they're both formulaic and relatively short, they are getting flagged with a high probability of AI-generated content.

Scanning Public Domain Content

Another common false positive we see if scanning public domain content such as books, journals, or content on the internet.

This kind of content can often give false positives even if the content was written hundreds of years ago.

I scanned the first page of Pride and Prejudice, written by Jane Austen in 1813. It came back as 67% probability of being written by AI.

How in the world can a work written over 200 years ago be considered AI?

The answer is simple: This is the kind of work that AI was trained to reproduce.

Most large language models today (such as ChatGPT, Bard, etc.) are trained on public domain books, papers, journals, and more. If the work is public domain, there is a good chance that some AI has been trained using this data.

This means that AI is trained to reproduce public domain works, and our AI detection algorithm is trained to detect public domain works.

It is possible that you will get a false positive by scanning public domain works, or content that has a similar style.

Not all public domain works get false positives: I also scanned parts of The Adventures of Sherlock Holmes and The Tragedy of Romeo and Juliet. Both scored 100% original to humans.

Your mileage may vary.

Lowering Your False Positive Rate

We recommend to scan content that is as long as possible. This helps solve the short content and the formulaic content problems since longer content will always be more unique than shorter content.

You can expect a higher AI score if you scan formulaic content such as an article with many statistics or a scientific journal. Public domain works or other work that an AI has been trained on will often give a higher score than usual.

The best defense against a false positive is awareness of what causes them, scanning several works instead of a single piece of work, and scanning longer content.

We recommend scanning 3-5 works of at least 100 words from the same author for best results.

Updated on: 21/08/2023

Was this article helpful?

Share your feedback


Thank you!