Breaking News

ByteDance Boosts Data Collection with New Web Scraper

ByteDance

ByteDance, the parent company of TikTok, is significantly enhancing its web data collection capabilities to bolster its generative AI models. A recent study by Kasada and Dark Visitors has revealed that ByteDance has launched a new web scraper named Bytespider, which is quickly becoming one of the most aggressive data collectors on the internet. This tool has surpassed major tech companies like Google, Meta, Amazon, OpenAI, and Anthropic in terms of data collection speed.

According to Sam Crowther, CEO of Kasada, Bytespider collects data at a rate approximately 25 times faster than OpenAI’s GPTbot, which is utilized for ChatGPT. Remarkably, Bytespider’s efficiency is estimated to be 3,000 times faster than ClaudeBot, Anthropic’s data-gathering tool. The findings also indicate a significant uptick in Bytespider’s scraping activities over the past several weeks, signaling ByteDance’s determination to compete effectively in the generative AI landscape.

Also Read: Nvidia Challenges OpenAI with Release of NVLM AI Model

Despite facing a potential ban in the United States due to national security concerns, ByteDance’s aggressive data collection efforts show no signs of slowing down. The company is reportedly developing a new large language model (LLM), which may enhance TikTok’s search functionality, suggesting that ByteDance aims to strengthen its market position.

Another concerning aspect of Bytespider’s operation is its apparent disregard for the widely accepted robots.txt protocol, which is intended to discourage automated data collection. While not legally enforceable, this violation reflects a broader trend among AI companies prioritizing aggressive data scraping practices to secure extensive datasets.

As ByteDance continues to expand its data collection strategies, the implications for user privacy and industry ethics remain critical issues for scrutiny. The tech community is closely monitoring how this evolution impacts the generative AI sector.

Facebook
Twitter
LinkedIn
Pinterest
WhatsApp