Two years ago today, 30th November 2022, ChatGPT was launched to the world and initiated the fastest userbase growth in history reaching 100 million users in two months. Much has been written about its capabilities, and it’s a really useful tool with many applications already, but it hasn’t fulfilled the most hyped promises. It hasn’t replaced swathes of jobs or removed our administrative burden forever. It isn’t even reading and classifying my emails yet. Why?
What impressed us about ChatGPT were its emergent capabilities. It could undertake tasks it wasn’t trained on. So, ask it just about anything and it would give a plausible answer. The problem, as we know, was its tendency to hallucinate, and over time we understood that it had some fundamental limits, especially when it came to reasoning (or “thinking”) on complex problems and its lack of long-term memory.
The real world is complex. We need to work on problems, think about them, and follow processes. We need to reread our work or test our output and improve it until it’s just right. We don’t work on a zero-shot basis like LLMs.
Now, of course, pretty early on we found that you could ask an LLM to improve its answer, perhaps by asking it to think “step by step”, and as LLM context windows increased you could feed the model more information to improve its answer.
With RAG, retrieval augmented generation, you can feed custom data relevant to a question or task to provide more context for an LLM. This is great for search and certain information sifting-and-searching use-cases, but it isn’t the leap forward we expected or hoped for that would forever remove our administrative burden.
Enter Chain of Thought and Agentic AI.
What happens if you continually feed the output of an LLM back into the model? What if you ask a model to produce multiple answers and choose the best or most consistent one? What if you had models that were trained just to analyse the outputs of other models? What if you could allocate more compute power to harder parts of a problem?
This is the basis of Chain of Thought prompting. It’s the next big leap forwards in LLMs, and it’s going to make its impact felt in 2025. In fact, we’ve already seen a preview with OpenAI’s reasoning model, o1.
The problem we’re facing is that although LLMs are still gradually improving, there’s evidence that we’re approaching the limit of how good they can be. It’s the law of diminishing returns, combined with the fact there’s only so much (non-synthetic) training data in the world. And only so much money in the world you can spend on training models.
So the answer is to use several customised models trained for specific tasks with a chain of thought that can break down complex tasks into smaller, manageable steps, and ask autonomously to fulfil a series of goals.
Why 2025 is the year of Agentic AI
On reflection, we were probably too quick to talk up the capabilities of LLMs, it clearly needs work to autonomously replace business processes or even to be a useful copilot.
But in 2025, the scene is set for Agentic AI for several reasons:
- Powerful open-weight models are available such as Llama from Meta, Gemma from Google, and Large from Mistral.
- We have the ability to fine-tune very capable smaller open-weight models within a reasonable budget
- There are improved developer APIs for models such as Structured Outputs that provides insights into the model’s thinking
- We expect new chain-of-thought and test-time-compute capabilities will soon be rolled out widely in open-weight models (and there are already guides on how to do this manually)
If the past two years have demonstrated both the promise and the limitations of large language models, the next year will be about realising the value of this technology.
So for those who have been asking “are we nearly there yet?” in the context of AI, the answer is that we’re getting close. Maybe AI will fully arrive and reshape our world in 2025. But even if it doesn’t, it’ll be a fun ride along the way.