Two Dozen Signs Of AI Acceleration

Interior of big modern server room. 3d rendering illustration

Right now, lots of people are obsessed with diagnosing the state of AI. How far have we come in our exploration of large language models?

We know that things are changing quickly. People are grasping for numbers showing not only how AI is doing in a market and enterprise context, but how experts are predicting its deployment over the next two quarters, until we ring in 2026.

First, there’s this set of numbers that I heard coming out of a recent episode of AI Daily Brief where Nathaniel Whittemore collected no less than 15 charts to go over the nuts and bolts of AI adoption. You might be forgiven for thinking that a podcast is a strange environment for charts, since your audience can’t actually see what you’re looking at, but I just wanted to showcase some of the numbers as an example:

GhatGPT going to 100 million users in just a few months, beating out prior records from platforms like TikTok
Anthropic’s quick scaling of revenue
Google’s doubling in token processing within two months
Growth in companies moving from experimentation to deployment of AI from 11% to 33% in two quarters
45% of a surveyed worker group willing to interact with AI agents at work

That’s one general overview of what’s going on in the AI world. Clearly, we’re in a time of great change.

Notes from IEEE

The Institute of Electrical and Electronics Engineers is a big name in technological analysis. With some 460,000 members in 190 countries, the trade group gets attention from those trying to read the tea leaves on tech development. I was looking at an article by Eliza Strickland providing more data points on AI. One thing that the IEEE resource went into was the cost of inference dropping, where the number was cut in half over a few months.

Among other findings: that Chinese models are catching up to American ones, that corporations are getting on board with AI investment, and that LLMs are becoming more adept at clinical diagnosis.

The Last One?

Another item that the IEEE piece covered was AI’s performance on something called Humanity’s Last Exam, which sounds ominous in kind of a doomsday clock way. HLE is a profound crowdsourcing of benchmarking material, from 1,000 subject expert contributors and 500 institutions across 50 countries. Here’s how the project is introduced on its own web site:

“Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity’s Last Exam, a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. The dataset consists of 2,500 challenging questions across over a hundred subjects. We publicly release these questions, while maintaining a private test set of held out questions to assess model overfitting.”

So basically, leaders in AI analysis are trying to create stronger, more difficult benchmarking tasks, because AI is doing so well on the existing ones.

What the IEEE found, in terms of HLE, is that even top models like OpenAI’s o1 are still in the single digits when it comes to getting the answers right, which, experts point out, is a good sign for the utility of the challenge.

Is Grok Ahead?

I also wanted to include one more data set, this time from Epoch AI, where presenters ranked individual models according to their capacities in teraflops. One stood out: take a look at the positioning of the Grok 3 model, so far removed from the scattershot.

And then think about this: as of July 9, Grok 4 is already out.

In coverage July 10 at AI Gyani, Shikha Singare tells us about how Grok sits AB the ARC-AGI-2 leaderboard.

“Grok 4 … (delivers) ~18% accuracy, clearly ahead of Claude Opus 4 (~10%) and GPT-4.5 or o3 variants,” Singare writes. “While Grok 4 is more expensive per task, it leads in raw performance—a compelling choice when task accuracy is mission-critical.”

Singare also brings up something called Vending-Bench, a dynamic market simulation, where the model did exceedingly well.

“Grok 4 achieved a stunning mean net worth of $4694.15, far ahead of Claude Opus 4 ($2077.41) and even human participants ($844.05),” Singare adds. “It also sold the highest number of units on average: 4569, indicating strong demand-generation capabilities.”

Was that a couple of dozen insights? Not all of them were in bullet points. I think that if you put all of this together, and connect the dots, you get a compelling picture of how AI is going to accelerate through the latter half of this year, and into the next.

Forbes