The days of all you can eat days of trawling data to train AI are winding down. As always, the companies that precipitated the problem of content creators shutting down access are the ones that will be the least impacted by it. They will just pay their way out and also control what the AI does so users will see a filtered and distorted responses that are aligned with the company's interests. The up and commers will have to choose between dying out from lack of data or working around the barriers somehow:
Another challenge is that while publishers can try to stop A.I. companies from scraping their data by placing restrictions in their robots.txt files, those requests aren’t legally binding, and compliance is voluntary. (Think of it like a “no trespassing” sign for data, but one without the force of law.)
Major search engines honor these opt-out requests, and several leading A.I. companies, including OpenAI and Anthropic, have said publicly that they do, too. But other companies, including the A.I.-powered search engine Perplexity, have been accused of ignoring them.
The likes of Perplexity might have to throw in the towel eventually. Turn to some horrible ad-supported model where the AI responds in a way that is most advantageous to advertisers or they just get acquired by one of the big companies who don't want the annoyance of a pesky little newcomer trying to upstage them. Either way, we the users of search end up holding the bag.
Comments