The Knights & Knaves of Instagram: Scammers vs. Vigilantes

3 minute read

This blog is adapted from the white paper, Post Grams Not Scam: Detecting Money-Flipping Scammers on Instagram Using Machine Learning.

The ZeroFox research team recently built a machine learning classifier to detect scammers on social media. The focus of their campaign was money-flipping financial scams on Instagram. The scams extort victims into sending money or disclosing banking information. The scammer promises to “flip” their money and return a huge profit. The scammers use Instagram to advertise their services with pictures of money, luxury goods and drugs as well as hijacking bank hashtags to target bank’s customers. They particularly target the poor and members of the military. At the end of the day, the banks often eat the cost, resulting in a considerable financial loss for both consumers and banks alike.

Example of a typical money-flipping scam on Instagram.

During the study, we built a machine learning classifier to label posts as either benign or malicious. But our algorithm had a problem: we kept labelling a certain type of post as malicious when it was in fact benign. These were posts from do-good vigilante Instagram users attempting to expose the scammers. We dubbed them “The Knights of Instagram.”

A Knight in action, attempting to expose a scammer.

To a machine, this looks like a scam post. It deliberately uses the same words, hashtags and emojis. It literally uses the same image. What’s a predictive classifier to do?

As we investigated this phenomenon, we stumbled on something even more interesting. Some scammers picked up on the Knight’s tactics and adapted to their own techniques. They would imitate Knight accounts and expose other scammers to gain legitimacy with potential victims. We dubbed this group “The Knaves of Instagram.” Once again, Knaves proved to share many commonalities with both Knights and traditional scammers, making it extremely difficult for our initial model to segment them properly (spoiler warning: we got it figured out!).

A Knave adopting Knight tactics to perpetrate a scam

But herein lies the beauty of machine learning: once we had a pre-classified training set of Knight and Knaves, the algorithm solved it’s own problem. In retraining, it was able to identify the subtle differences in features and began classifying Knight and Knaves correctly. It was only after the algorithm had identified the key features that we as humans came to understand the subtleties as well.

After adding Knights and Knaves to our training set and recalibrating our classifier, we ultimately sharpened the model’s boundary between what did and what did not constitute a scam. The overall accuracy of the final model sits at over 99%. The model is able to consistently distinguish between benign posts and scam posts, including Knights and Knaves. The model’s ability to classify such boundary cases demonstrates its effectiveness and low trade-off between generalizability and precision.

Its subtleties such as the differences between Knights and Knaves that make social media threats difficult to identify accurately and consistently. The data set is vast and dynamic and attackers can nuance their approach like never before. In a world of big data and small margins, machines reign supreme on the battlefield. When it comes to Knight and Knaves, we bring a stealth bomber to a sword fight.

See ZeroFox in action