Cheryl Andrea - Traffic at 3am?
Abstract
Thankfully this isn’t about road traffic but ironically, website traffic. And if you go to investigate what’s causing it, it’s just an endless amount of requests from a battalion of AI bot scrapers, visiting your website for nothing but content to train their machine learning models with. It’s like the plague, they hunt you for your content and you’re out there battling with a toothpick. In a more technical view, your toothpick is the robots.txt, a file that politely indicates to bots to not scrape content from your website. Not like most AI bots care (only some good ones do), it was just a suggestion anyway.
It’s definitely a growing concern, content and website owners are tired of having to deal with their content being scraped without their approval and dealing with the overhead costs that AI bot scrapers bring. This presentation deep dives into the different tactics people and companies are adopting to help mitigate AI bots and the issues it brings. So far it feels like you’re playing chess while AI companies are playing checkers. In most cases, the defense put up is evaded in a month or someone comes up with an offensive tactic but has low impact. Do we make these bots show proof of work? Or maybe make them pay money to scrap content? (At least, we’re getting some money from losing our content). Or do we go the other way and try to obliterate them, exhaust the scraper’s resources in return? How about poisoning the soup (training data for ML model)? Or is the soup poisoning itself at this point with the vast amount of AI generated content being posted anyway? Do we even bother trying in the first place, or block every bot out even if it’s the search engine ones? Lots of questions, let’s deep dive into it together!
Biography
If there’s anything you need to know about miss Cheryl Andrea Fernando, it’s most definitely her love-hate relationship with AI. Sure, AI has helped her debug 1 in 5 bugs without making it significantly worse, but it’s the other 4 instances that make her question why she approached AI for help in the first place. Cheryl has a background in computer science and engineering, with much interest in cybersecurity and much more interest in wanting to help safeguard the world against the exploitative nature of AI. Just as a social human being, noticing how often people gravitate towards asking ChatGPT without a second thought even for the most simplest task, almost feels like there’s a growing lack of originality, creativity and cognitive ability making them feel like human side effects of AI. Currently working at SURF on the very same topic of AI Bot Mitigation, she hopes to come up with a solution that could help content owners safe guard their content from the pesky AI scraping bots that crawl the internet endlessly to feed their ever-growing machine learning model. Cheryl has a lot of creative passions and hobbies like crochet, sewing and music, accompanied by being chronically online and missing the days she could quite easily tell when something was AI generated.
