July 8, 2024
Generative AI, particularly in text-based applications, thrives on its capacity to produce clear and comprehensive responses. This is enabled by training the AI on a vast array of data points. The advantage of this approach is that the text generated is typically user-friendly and answers a wide range of questions, known as prompts, with relevant information.
However, there are limitations. The data used for training these AI models, often large language models (LLMs), might not be up-to-date. This means that the AI's knowledge could be weeks, months, or even years old. In the context of a corporate AI chatbot, this limitation becomes more pronounced as the AI may lack specific and current information about the company's products or services. This gap in knowledge can sometimes lead to responses that are not accurate, potentially diminishing user trust in the AI technology.
When you build your application on top of a base large language model (LLM) like GPT or Llama, it's important to understand its capabilities and limitations. If you ask a question to such a model, you might soon notice that these models often lack context or have very limited understanding of how things work. Moreover, even if they had a grasp on current events or specific information, it would likely be outdated, perhaps as far back as September 2021. This is a key consideration when relying on these AI models for up-to-date information or context-specific inquiries.
Retrieval-Augmented Generation (RAG) is an advanced technique in the field of generative AI that enhances the capabilities of large language models (LLMs). The key feature of RAG is its ability to supplement the output of an LLM with specific, up-to-date information without altering the core model. This targeted information can be more current than what's contained in the LLM, and can be tailored to specific industries or organizations.
The primary advantage of RAG is that it enables generative AI systems to provide answers that are more contextually relevant and based on the most recent data. This approach is particularly valuable in scenarios where the latest information is crucial or where specific industry knowledge is required.
RAG gained prominence in the AI community following the publication of the paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" in 2020. Authored by Patrick Lewis and his team at Facebook AI Research, this paper laid the groundwork for the concept. Since then, RAG has been widely adopted and explored by both academic and industrial researchers, seen as a method to significantly enhance the utility and accuracy of generative AI systems.
These advantages collectively position RAG as a transformative approach in Natural Language Processing (NLP), overcoming the inherent limitations of traditional language models and significantly enhancing the capabilities of AI-powered applications.
Retrieval-Augmented Generation (RAG) can be understood through a practical example, such as a sports league wanting to provide real-time, detailed information to fans and media through a chat system. Here's how RAG enhances this process:
In summary, RAG represents a significant advancement in the field of AI, particularly for applications that require up-to-date and context-specific information. Its ability to enhance LLMs without the need for constant retraining makes it a valuable tool in developing more responsive and accurate AI-driven chat and information systems.
Consider all the information that an organization has—the structured databases, the unstructured PDFs and other documents, the blogs, the news feeds, the chat transcripts from past customer service sessions. In RAG, this vast quantity of dynamic data is translated into a common format and stored in a knowledge library that’s accessible to the generative AI system.
The data in that knowledge library is then processed into numerical representations using a special type of algorithm called an embedded language model and stored in a vector database, which can be quickly searched and used to retrieve the correct contextual information.
Now, say an end user sends the generative AI system a specific prompt, for example, “Where will tonight’s game be played, who are the starting players, and what are reporters saying about the matchup?” The query is transformed into a vector and used to query the vector database, which retrieves information relevant to that question’s context. That contextual information plus the original prompt are then fed into the LLM, which generates a text response based on both its somewhat out-of-date generalized knowledge and the extremely timely contextual information.
Interestingly, while the process of training the generalized LLM is time-consuming and costly, updates to the RAG model are just the opposite. New data can be loaded into the embedded language model and translated into vectors on a continuous, incremental basis. In fact, the answers from the entire generative AI system can be fed back into the RAG model, improving its performance and accuracy, because, in effect, it knows how it has already answered a similar question.
An additional benefit of RAG is that by using the vector database, the generative AI can provide the specific source of data cited in its answer—something LLMs can’t do. Therefore, if there’s an inaccuracy in the generative AI’s output, the document that contains that erroneous information can be quickly identified and corrected, and then the corrected information can be fed into the vector database.
In short, RAG provides timeliness, context, and accuracy grounded in evidence to generative AI, going beyond what the LLM itself can provide.
RAG isn’t the only technique used to improve the accuracy of LLM-based generative AI. Another technique is semantic search, which helps the AI system narrow down the meaning of a query by seeking deep understanding of the specific words and phrases in the prompt.
Traditional search is focused on keywords. For example, a basic query asking about the tree species native to France might search the AI system’s database using “trees” and “France” as keywords and find data that contains both keywords—but the system might not truly comprehend the meaning of trees in France and therefore may retrieve too much information, too little, or even the wrong information. That keyword-based search might also miss information because the keyword search is too literal: The trees native to Normandy might be missed, even though they’re in France, because that keyword was missing.
Semantic search goes beyond keyword search by determining the meaning of questions and source documents and using that meaning to retrieve more accurate results. Semantic search is an integral part of RAG.
The evolution of Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) is poised for exciting developments:
Is RAG the same as generative AI?
No. Retrieval-augmented generation is a technique that can provide more accurate results to queries than a generative large language model on its own because RAG uses knowledge external to data already contained in the LLM.
What type of information is used in RAG?
RAG can incorporate data from many sources, such as relational databases, unstructured document repositories, internet data streams, media news feeds, audio transcripts, and transaction logs.
How does generative AI use RAG?
Data from enterprise data sources is embedded into a knowledge repository and then converted to vectors, which are stored in a vector database. When an end user makes a query, the vector database retrieves relevant contextual information. This contextual information, along with the query, is sent to the large language model, which uses the context to create a more timely, accurate, and contextual response.
Can a RAG cite references for the data it retrieves?
Yes. The vector databases and knowledge repositories used by RAG contain specific information about the sources of information. This means that sources can be cited, and if there’s an error in one of those sources it can be quickly corrected or deleted so that subsequent queries won’t return that incorrect information.