How Meta’s Scale Deal Upended the AI Data Industry

Posted by Billy Perrigo | 4 hours ago | AI, Uncategorized | Views: 5


Meta’s $14.3 billion investment in Scale AI, the leading player in the AI data industry, was a very strange deal indeed.

Meta acquired 49% of the company in the deal announced last Thursday. Scale announced that its CEO, Alexandr Wang, would quit to become an executive in charge of a new “Superintelligence” unit inside the tech giant. (The deal has yet to receive regulatory approval.)

The deal was good news for Meta, which was widely seen as falling behind in the AI race and in need of new AI leadership, and for Wang, who at 28 will become one of the most powerful AI players in the tech industry as part of the deal. 

But the deal was less obviously beneficial for Scale itself, which is likely to lose lucrative business as a result of its new proximity to Meta. OpenAI and Google, two of Scale’s major clients and Meta’s major rivals, reportedly began winding down their work with Scale in the wake of the deal.

“The labs don’t want the other labs to figure out what data they’re using to make their models better,” says Garrett Lord, the CEO of Handshake, a Scale competitor, who says that demand for his company’s services “tripled overnight” in the wake of the Meta deal. “If you’re General Motors or Toyota, you don’t want your competitors coming into your manufacturing plant and seeing how you run your processes.”

Other Scale competitors say they have seen a similar flurry of dealmaking. “The last week has been completely insane,” says Jonathan Siddharth, CEO of Turing, a business that helps all the major AI companies connect with human experts to create proprietary training data. In the past two weeks, Turing has added potential contracts worth $50 million, Siddharth says, “as frontier labs recognize that advancing AGI requires truly neutral partners.”

“This is the equivalent of an oil pipeline exploding between Russia and Europe,” says Ryan Kolln, the CEO of Appen, another AI training data company, describing the disruption to the industry’s data supply chain. “Customers are really quickly evaluating: how do they get alternative supply?”

Kolln adds: “Now, with Meta being such a large owner of Scale, the ability for [Meta] to get information around what the other foundation model labs are doing becomes a lot more challenging to manage.”

Multiple Scale employees have signed contracts to move to two rival data firms in the last week, according to people with direct knowledge of hiring processes.

A Scale AI spokesperson had no comment, but pointed TIME toward a report that quoted OpenAI’s chief finance officer saying that OpenAI would continue to work with Scale following the Meta investment. OpenAI and Google spokespeople declined to comment, but each pointed TIME to reports that said they were winding down their work with Scale. Meta and Anthropic did not respond to requests for comment. (TIME has a technology partnership with Scale AI.)

The amount of money that could ultimately change hands as a result of the Meta deal is immense. Each of the leading AI companies now spends around $1 billion on human data per year, according to Lord — and their data budgets are increasing, not decreasing. As Scale’s competitors jostle to fill the void left by Meta’s dealmaking, the corporate drama points to a fundamental reshaping of how the world’s most valuable AI models get built.

Shifting tides in the data industry

Scale got its start as a data labeling company, marshaling armies of human contractors around the world — mostly in low-income nations like India, Venezuela and the Philippines — who would earn pennies per task to do things like labeling images or answering simple questions.

This type of work was useful in the early stages of AI development, when AI companies were still struggling to teach image models to tell the difference between cats and dogs, or teach language models to string together coherent sentences.

But as AI models have improved, the type of data that AI companies are seeking has changed radically. This shift became even more pronounced after the industry shifted toward so-called “reasoning” models: AIs that write down a train of thought before settling on an answer. These models are now better than most humans at writing code, carrying out research, and answering complex science questions.

This “reasoning” paradigm led the likes of OpenAI, Google and Anthropic to predominantly seek expert data. The most lucrative training data is now written by people with PhDs, who write down the exact steps they take while solving problems, so that AI models can learn to mimic this behavior. 

“The industry is shifting towards needing smarter and smarter humans,” says Siddharth, the Turing CEO. “For some areas, even a single expert human is not enough to move the needle. You need a team of expert humans.”

What exactly each AI company asks its expert humans to do is a closely-guarded secret. All AI labs tend to converge around the same strategies over time, insiders say, but the longer each lab can keep its training processes secret, the more time they can spend at the “frontier” of the industry, with their AI model performing better than their rivals’.

That’s why Meta’s big investment in Scale seems to have unnerved all the frontier AI companies. Meta may currently be behind in the AI race — but if it can access some of its rivals’ most precious secrets, there’s a chance it could begin to rapidly close the gap.



Time

Leave a Reply

Your email address will not be published. Required fields are marked *