What Apple’s AI Warning Means For Business Leaders

A new Apple research paper reveals hidden flaws in today's most advanced AI models, showing they may fail completely when faced with complex tasks. — A new Apple research paper reveals hidden flaws in today’s most advanced AI models, showing they may … More fail completely when faced with complex tasks.

Adobe Stock

A groundbreaking Apple research paper has sent shockwaves through the AI community, revealing serious limitations in today’s most advanced models, flaws that have gone undetected until now.

The paper “The Illusion Of Thinking” shows that the “chain-of-thought” reasoning applied by advanced models like GPT-4, Deep Seek, and Claude Sonnet suffer from “complete accuracy collapse” when tasks become too complex.

And the most worrying aspect seems to be that once tasks are complicated enough, throwing more processing power, tokens or data at them does little to help.

This has obvious implications for big-picture ideas that we’ve become accustomed to hearing, such as AI solving huge challenges like climate change, energy shortages, or global poverty.

Large Reasoning Models, or LRMs, are the problem-solving engines powering agentic AI. Some consider them to be a step on the path towards artificial general intelligence, AI that can apply its learning to any task, just like humans can. Huge amounts of investment have been made in developing them, as they are considered the most advanced and useful AI models available today.

But does this mean billions of dollars worth of investment have been poured into what is essentially a technological dead end?

I don’t think so. But I do believe there are important lessons to be learned for businesses and organizations looking to unlock the true potential of AI, so let’s take a closer look.

Findings

The headline premise of the report is that AI “thinking” may just be an illusion rather than a true, functioning mirror of the objective reasoning humans use to solve problems in the real world.

This is supported by findings of “accuracy collapse,” which show that while LRMs excel at managing low-complexity tasks, as complexity increases, they eventually reach a point where they fail completely.

Perhaps most unexpectedly, the models appear to throw in the towel, using fewer tokens and putting in less effort once the task becomes too complex.

And even if they are explicitly told how to solve the problem, they will often fail to do so, casting doubt on our ability to train them to move past this behavior.

These are important findings because, in business AI, the belief has often been that bigger is better, meaning bigger data, bigger algorithms and more tokens. Apple’s findings suggest that beyond a certain point, these benefits of scale dissipate and eventually break down.

The implication is that usefulness also diminishes when AI is asked to perform tasks that are too complex, such as formulating broad, high-level strategies in chaotic real-world scenarios or complex legal reasoning.

What Does This Mean For Businesses Today?

Rather than an insurmountable obstacle, I see this as a signpost that generative language AI shouldn’t be treated as a magic bullet to solve all problems.

For me, there are three key lessons here.

Firstly, focusing the attention of AI on structured, low-to-mid complexity tasks is more likely to hit the sweet spot.

For example, a law firm shouldn’t expect it to simply produce a winning case strategy for them. The problem is too complex and open-ended and will inevitably lead to generic, useless output once the model reaches a point where it can no longer reason effectively.

The firm can, however, use it to extract relevant points from contracts, create summaries of relevant prior case law, and flag up risks.

Secondly, it emphasizes the importance of the human-in-the-loop, the vital element of human oversight that’s needed to ensure AI is used responsibly and accountably.

Thirdly, when “accuracy collapse” is a danger, learning to recognize the signs, such as a drop in token use as the model gives up its attempts at reasoning, is critical to mitigating its impact.

Playing to the strengths of AI while cushioning against the impact of its weaknesses is the name of the game.

So, Has AI Hit A Dead End?

In my opinion, Apple’s research doesn’t herald a “dead end” or end-of-the-road scenario for AI. Instead, it should be used by businesses to help them focus on areas where they are likely to succeed and to understand where they should build resilience against AI failure.

Understanding the limitations of AI shouldn’t stop us from benefiting from it. But it helps us avoid situations where serious harm or damage could be caused by reasoning collapse or just wasted time and money.

Agentic AI has the potential to help in this regard, with its ability to deploy various tools to bridge the gaps in situations where reasoning alone is insufficient. Similarly, the concept of explainable AI is important because designing systems to be transparent means that when a collapse does occur, we will have a better understanding of what went wrong.

Certainly, no one should expect AI to always work perfectly and produce the best solution to every possible problem. However, the more we understand it, the more we can leverage its strengths and the more likely we are to create genuine value.

Forbes