RAG (retrieval-augmented generation) provides LLMs with access to outside knowledge, while tool calls enable LLMs to use external tools. Tool calling is sometimes also called tool use, for example here.
For RAG, outside knowledge could be news, websites, R&D publications, or molecule data, for example. And in the case of tool calls, the external tools could be other software tools such as calculators, but also kinetic tools such as robots.
Typically, RAG uses the following steps:
Similar to how RAG connects LLMs to external knowledge, tool calls give LLMs the ability to use external tools. Basically, the goal of a tool call is to get from a user command like “Get me the fundamentals for MSFT as of today” to executing a tool like this:
get_fundamentals(ticker_symbol: string, date: string)
In this example, the role of the get_fundamentals
call is to get fundamentals like market cap, average volume, etc. for a user-defined stock (ticker_symbol
), e.g. MSFT (Microsoft), at a user-defined date
, from an API that provides such data (an API is an interface that lets one computer or app get data from another computer or app). After the call, the LLM takes the data returned from the API, combines it with the user question (“Get me the fundamentals…”) and perhaps some extra instructions how to respond, and writes a natural language response. This response could look like this, for example:
Here are the fundamentals for MSFT on 15 April, 2024:
- MARKET CAP: 3.02T USD
- AVG VOLUME: 20.41M
- P/E RATIO: 35.20
- DIVIDEND YIELD: 0.74%
Notice that the LLM does not use the tool directly. Rather, it generates the (usually JSON) code that allows some other part of your softare, not the LLM itself, to call the get_fundamentals
tool.
This distinction is important. It means that you can build LLM-independent safety features against potentially harmful tool calls. For example, if your tool call does something in the real world (think robots), you will most likely want to introduce some safety features and guardrails that are independent of the LLM itself.
You can combine tool calling and RAG. For example, you could build another tool, make_ticker_symbol_onepager
. This tool could get the fundamentals for a ticker symbol with get_fundamentals
, and then use RAG to get the latest news on your ticker symbol. Based on all that, it could then write a one-pager that combines all these data.
Because LLMs give very convincing answers when asked a question, it is tempting to think of them as knowledge databases. But that is not how they are trained, and it is not what they are designed for. Rather, the objective of training an LLM is to memorize and compress knowledge (source). And when you compress knowledge, you forget some of the details of the knowledge. In fact, that’s the point: You compress, and as a result, you get reasoning. You generalize, you draw analogies, etc. In fact, compression seems to be a fundamental principle in human learning also (source).
RAG takes this into account. It separates the knowledge access part from the reasoning part, and uses LLMs for the latter.
According to some estimates, more than 300 TB of new data are generated every day (source). By contrast, GPT-3 apparently was trained on ca. 45 TB of data (source). So even if newer models were trained on let’s say 10x the amount of data, the new data produced every day would still outpace the LLM training process very quickly.
”Sure, but some day somebody will figure out a faster LLM training process that can keep up with new data produced.”
OK, but even if this happens, LLMs still won’t be knowledge bases but reasoning engines (see previous section).
Even if you could somehow “train new knowledge into an LLM”, this process, which is called fine-tuning, takes a while. By contrast, upserting new documents into a knowledge database is typically a matter of milliseconds. Not to mention that in a knowledge database you can use metadata such as document authors, dates, sources, etc.. By contrast, there is no straightforward way for doing this with an LLM.
In most serious information search scenarios, people want to know where a piece of information came from. With RAG, you can always provide the source of any piece of information. With LLMs, you cannot do this.
I already mentioned tool calling for robots above. There are companies that build LLMs and other AI models specifically for interacting with the physical world. Here are some examples:
We use both RAG and tool calling across our products. For example, in Spark, behind every command there are dedicated tool calls and RAG processes, depending on the command. For instance, a command that focuses on startups uses a different RAG process than a command that gets research papers.