Why Is Retrieval Augmented Generation Or RAG Popular Today?

Posted by John Werner, Contributor | 3 hours ago | /ai, /innovation, AI, Innovation, standard | Views: 6


There’s an approach called Retrieval Augmented Generation in AI that’s becoming a key way to help get targeted results for models.

You could say that it’s like chocolate and peanut butter – two great taste that taste great together.

Or you could describe it in more technical ways. Essentially, Retrieval Augmented Generation is when you add information that the LLM should know as it applies its own training data and knowledge to a task.

Over at GeeksforGeeks, experts explain it this way:

“In traditional LLMs, the model generates responses based solely on the data it was trained on, which may not include the most current information or specific details required for certain tasks. RAG addresses this limitation by incorporating a retrieval mechanism that allows the model to access external databases or documents in real-time.”

Then there’s a nice flow chart with “data chunks” and other components, showing how this type of thing works.

Think about how this would work in practice – for example, consider how you might give a chatbot a series of white papers about your business, and then ask it questions about your business model. Or on a personal level, if you want the AI to understand you better, you give it personal documents like diary recordings, or some of your past writing, in order to help it have a better knowledge of you as a person.

In a very broad sense, you could say that RAG involves adding anything that wasn’t in the original training set. That might be for reasons of nuance, or timing, or purpose, or it might just be to help target the result the way you want.

Getting to the Point

I really like this —

At Learn By Building AI, Bill Chambers is explaining that there’s a simple approach to RAG.

First, he contrasts it with this, which he says he found at Facebook:

“Building a model that researches and contextualizes is more challenging, but it’s essential for future advancements. We recently made substantial progress in this realm with our Retrieval Augmented Generation (RAG) architecture, an end-to-end differentiable model that combines an information retrieval component (Facebook AI’s dense-passage retrieval system) with a seq2seq generator (our Bidirectional and Auto-Regressive Transformers [BART]

model). RAG can be fine-tuned on knowledge-intensive downstream tasks to achieve state-of-the-art results compared with even the largest pretrained seq2seq language models. And unlike these pretrained models, RAG’s internal knowledge can be easily altered or even supplemented on the fly, enabling researchers and engineers to control what RAG knows and doesn’t know without wasting time or compute power retraining the entire model.”

Good grief…

Chambers then provides a neat little drawing that shows a “corpus of documents” getting connected to an LLM model through user input.

That made sense to me: RAG means adding specific information resources! Now, there are technical details, for sure, but I thought the tutorial did a great job overall of breaking this down, so that’s another resource for anyone who wants to learn more about how it actually works.

Using RAG

I also wanted to reference a tech talk by Soundararajan Srinivasan, Sr. Director of AI Program at Microsoft, and a colleague, Reshmi Ghosh, a Microsoft Sr. Applied Scientist, at Imagination in Action in April, where they talked about practical use of RAG.

Using terms like “knowledge store,” “vector database,” “orchestrator,” and “meta prompt,” Srinivasan went over how these systems can work, saying they help us to understand the limitations of AI in its context.

And “context” is also an important term because, as he describes, a larger context window adds capability, potentially with a lower memory footprint.

Here are some other reasons the presenters talked about using RAG:

  • To combine knowledge and reasoning
  • To democratize model use
  • For time resource efficiency

Ghosh then talked about how we understand whether a model chooses to use the RAG information in its processing.

“You have all these different contexts that are sent with the query to tell the model that, ‘hey, here’s the external knowledge that you may or may not know,’” she said.

“When we are designing systems with large language models, also small language models like llama and phi, we are essentially finding that if you can send in context by compartmentalizing the data points and not fine-tuning it, you are still going to get factual queries answered in a qualitative manner of accuracy.”

Ghosh also mentioned multi-modality.

“You can essentially have databases that have images, that have voice notes, that have sounds or music notes of any kind, and you can still build AI applications around it with the same kind of gains, because now you know that the models are tending towards utilizing RAG context and relying less on the internal memory, and this is also opening up new doors for all the new frameworks that are being discussed.”

This, she added, is useful with protocols like MCP (Model Context Protocol) and A2A (Agent to Agent systems).

That’s important as we move into an era of new interfaces, where we’re not just limited to typing to our AI partners. We have voice now, and more is coming in the future, with image and video generation that will be vibrant enough to replace text-based models.

Some would say we’re entering a world of dreams, where so much is possible that was previously impossible.

RAG might be one component of making sure that we can steer the bus and deliver the kinds of results that we’re looking for. It helps with what you might categorize as “convergence” for a digital intelligence system. So keep an eye on these kinds of methodologies as we continue to design more sophisticated AI tools and resources.



Forbes

Leave a Reply

Your email address will not be published. Required fields are marked *