How Perceptual Hashing Can Be Used to Commit Fraud

By

4 min Read

Arkose labs fights fraud by understanding our attackers

When a media company needs to detect duplicate pieces of media, be it images, songs or videos, they turn to a technique called perceptual hashing. This is the technique of finding some abstract fingerprint of the media, in a way that is reproducible and comparable. For an image, this might just be reducing the image size down to a tiny size and considering this the fingerprint.

In this example, any small modifications to the image are ignored, meaning the large details are all we care about comparing. This allows a perceptual hash to detect that two pieces of media are similar, giving us a score of how similar the two pieces of media really are. This is great, instead of comparing millions of giant image files, which would be incredibly costly, this technique can get the same results very quickly and cheaply.

Of course, this example is the simplest form of perceptual hashing, while more advanced forms can detect a larger range of modifications, but are not as easy to explain.

The more advanced versions of this technique are used to protect online dating, social media and detect deep-fakes all in the interests of protecting users and companies on the internet.

The dark side of benevolent technologies

While this technology has allowed large media companies to detect fraudsters creating fake accounts, and copyright abuse from users uploading unauthorized content, it also has the ability to be used to create automated solvers to defeat enforcement challenges.

This is achieved by storing a database of correct answers and comparing a challenge image to the database, to find a matching example, which will give the correct answer to the attackers’ bot.

These are the rough steps an attacker would take to try and use this method.

  1. Download thousands of images
  2. Label the data, choosing correct vs. incorrect answers
  3. Build an automated system that can use the website
  4. Compare the challenge image to your thousands of downloaded images using perceptual hashing, and solve the challenge without human input
  5. Continue on your way to perform malicious deeds.

Due to this threat, we at Arkose Labs have developed techniques that mean that this approach is easily squashed, and our clients won’t ever see these styles of attacks get through.

Randomization

Every image we serve to end-users goes through an array of randomization techniques specifically designed to defeat specific attacks. These techniques are specialized based on the individual puzzle. We randomize the image in a way that specifically can defeat a perceptual hash comparison.

With more advanced puzzles that we use in cases of suspected automation, we use compositing, which is using multiple transparent images overlaid to create a new image. This introduces random elements that will totally throw off perceptual hashing, with way more variance in the possible images than would be at all practical for any sweatshop, let alone a single attacker to download. This could include tree branches, random noise backgrounds, or even other animals in the background that are easily ignored by the legitimate user. These random elements, however, are not the nicest experience possible, so we reserve their use for suspected attackers.

Artist Generated Content

While randomization is good, it can’t get us the whole way there as a persistent enough attacker can possibly get enough samples to overcome this.

Using a pipeline that is as far as we know, unique in the field, we utilize 3d artists to generate hundreds of different 3d animals and objects to create an endless pipeline of assets that have never been seen before.

These assets are also created and tested against possible solver techniques so that we can prove both the raw strength of an asset to be learned via this and other techniques. We rank our puzzles and assets based on the ease of use for legitimate users as well as the estimated difficulty to automate.

The main weakness of a perceptual hashing attack is that by the nature of how it works, it cannot work when presented with assets it has never seen before.

A defense based on knowledge-gathering, not hope

At Arkose Labs we recognize that a strong defense has to be active, where we understand the nature of attackers. This allows us to stay at the head of the game and create strategies that are proven to work in the field, and stop both human and bot-driven fraud, with certainty.

To learn how Arkose Labs defeats attacks, schedule a demo now.

Meet the Author

Share Now

Share on twitter
Share on facebook
Share on linkedin