Source link : https://tech365.info/perplexity-defensive-over-ignoring-robots-txt-and-stealing-information/
Perplexity’s emblem surrounded by lights and flowers. Picture supply: Perplexity
Perplexity was found to be actively bypassing blocks from web sites to scrape content material in 2024, and a brand new report exhibits that it has continued with growing sophistication as the corporate defends the observe.
Apple obtained some important blowback when it was found that Applebot had been crawling the net for years to get information to coach Apple Intelligence. Web sites instantly blocked the bot, and others, which sparked some attention-grabbing discoveries about how AI firms are working.
A yr on, and no less than one firm continues to be doing every little thing in its energy to disregard robots.txt and scrape webpages anyway — Perplexity. Based on a report from Cloudflare, Perplexity is utilizing a number of strategies to undermine the belief anticipated on the net and entry information to coach its massive language fashions.
Testing was performed by creating new web sites that had by no means been scraped earlier than, then asking Perplexity AI about them. When the crawling bot encountered a robots.txt file that advised it to not crawl, a brand new bot with a distinct browser agent, IP deal with, and even a brand new ASN appeared.
Then, Perplexity was capable of present info that was out there solely on the web site. It was clear that Perplexity was working this new bot, though it was unlabeled and its IP did not seem in Perplexity’s official IP vary.
The…
—-
Author : tech365
Publish date : 2025-08-05 04:40:00
Copyright for syndicated content belongs to the linked Source.
—-
1 – 2 – 3 – 4 – 5 – 6 – 7 – 8