LlamaIndex RAG-AGENT: Query and Summarize Over Database

Introduction

In this lesson, we explore the concept of agents in the LlamaIndex framework, with an emphasis on utilizing these agents as engines with internal reasoning and decision-making mechanisms. Creating an agent-based pipeline includes integrating our RAG-based application with data sources along with various tools. It is essential to remember that developing these tools for the agents requires a deep understanding of how users are likely to engage with the application and predict potential usage patterns.

The goal of an RAG system is always to provide users with insightful content more effectively than extensive manual searches. Adding agents to our system is another step towards improving our product's user experience and decision-making ability.

The LlamaIndex framework offers numerous possibilities for combining agents and tools to enhance the abilities of Large Language Models. We will examine the implementation of OpenAI agents with various data sources. Additionally, we'll create custom functions to boost the agent's capabilities in areas where they may lack information, such as mathematical operations. The rest of this lesson will demonstrate how these agents are capable of making decisions and integrating various resources to formulate a response.

Before diving into codes, we must prepare our environment by installing the necessary packages and configuring the API keys. Execute the following command in your terminal to install the required packages using the Python Package Manager (PIP). Next, run the subsequent Python script to configure the API keys in your environment. Remember to obtain the keys from the OpenAI and Activeloop platforms and substitute them for the placeholders. Before starting this guide, make sure you install all the requirements in the requirements section.

pip install -q llama-index deeplake openai cohere langchain tiktoken
The sample code.
import os
import getpass
os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Enter your ActiveLoop API token: ')
os.environ['OPENAI_API_KEY'] = getpass.getpass('Enter your OpenAI API key: ')
The sample code.

Now, let’s go through the next steps in detail.

OpenAI Agent

Step 1: Defining Data Sources

Discussing datasets in RAG mainly refers to data sources. It would be interesting to tag and track right from the start. This means keeping track of the general source of the data, whether it is from a specific book, documentation, or a blog. For example, the Towards AI RAG AI tutor currently has 5 “data sources”: Towards AI blogs, Activeloop documentation, LlamaIndex documentation, LangChain documentation, and HuggingFace documentation. Later, when we increase the dataset size with new data points, we add them to those sources or create new ones. Doing this process from the start will improve the chatbot’s efficiency by introducing the “routers” to focus on the related information source to answer a question.

A key step in building a data-driven application with the LlamaIndex RAG system is discussing and selecting the appropriate dataset. The quality and relevance of the data are fundamental, as they directly influence the system's performance capabilities. A well-chosen dataset is essential to showcase and test the effectiveness of our RAG system accurately. It will be the same whether they are local files or hosted online with a vector store database like Deep Lake. However, note that using online tools like Deep Lake has built-in features to easily visualize, query, track, and manage your data.

It is a good practice to start your RAG pipeline design with a small dataset, such as web articles. Setting up a foundational data environment that is manageable yet sufficiently rich is critical to ensuring a smooth start. This way, you can quickly test, debug, and, most importantly, understand your RAG system. You can easily query and evaluate responses on a dataset you control and grasp.

The dataset for this lesson will comprise Nikola Tesla's life, work, and legacy, with detailed information about his innovations, personal history, and impact. We employ two text documents: the first with bold future predictions that Tesla mentioned during his lifetime and the second file with biographical details about his life. Let's import the files and set up the indexes. We will utilize a mix of data sources from the Deep Lake vector store for the first file and establish indexes from local storage for the second file.

The initial step involves downloading the documents using the wget command. Alternatively, you can access and manually save the files from the URLs below.

!mkdir -p 'data/1k/'
!wget 'https://raw.githubusercontent.com/idontcalculate/data-repo/main/machine_to_end_war.txt' -O './data/1k/tesla.txt'
!wget 'https://raw.githubusercontent.com/idontcalculate/data-repo/main/prodigal_chapter10.txt' -O './data/1k/web.txt'
The sample code.

Store Indexes Deep Lake

As previously stated, we'll read the first text file and process it for storage in Deep Lake. The SimpleDirectoryReader class in LlamaIndex can browse through a directory and transform text files into a Document object, facilitating processing.

from llama_index.core import SimpleDirectoryReader

tesla_docs = SimpleDirectoryReader( input_files=["data/1k/tesla.txt"] ).load_data()
The sample code.

We are now ready to establish a database on the Activeloop platform by specifying the organization ID (which defaults to your username) and naming the database. The DeepLakeVectorStore class is used to create an empty database.

%pip install llama-index-vector-stores-deeplake
%pip install llama-index-llms-openai
%pip install llama-index-agent-openai
from llama_index.vector_stores.deeplake import DeepLakeVectorStore

# By default, the organization id is your username.
my_activeloop_org_id = "<YOUR_ORGANIZATION_ID>"
my_activeloop_dataset_name = "LlamaIndex_tesla_predictions"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"

# Create an index over the documnts
vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=False)
The sample code.
Your Deep Lake dataset has been successfully created!
The output.

Then, we can utilize the database object to create a storage context, allowing us to generate indexes (embeddings) and insert them into the database using the VectorStoreIndex class.

from llama_index.core.storage.storage_context import StorageContext
from llama_index.core import VectorStoreIndex

storage_context = StorageContext.from_defaults(vector_store=vector_store)
tesla_index = VectorStoreIndex.from_documents(tesla_docs, storage_context=storage_context)
The sample code.
Uploading data to deeplake dataset.
100%|██████████| 5/5 [00:00<00:00,  7.17it/s]
/Dataset(path='hub://genai360/LlamaIndex_tesla_predictions', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
   text       text      (5, 1)      str     None   
 metadata     json      (5, 1)      str     None   
 embedding  embedding  (5, 1536)  float32   None   
    id        text      (5, 1)      str     None
The output.

The index we have created for the first file is ready to be integrated as a source in the pipeline. However, we must also process the second file before proceeding.

Store Indexes Locally

The method to save the index on your hard drive begins similarly to our earlier demonstration, employing the SimpleDirectoryReader class.

webtext_docs = SimpleDirectoryReader(input_files=["data/1k/web.txt"]).load_data()
The sample code.

Just as we utilized the StorageContext class earlier for employing the DeepLake database as storage, we can apply the same configuration but specify a directory to store the indexes. The following script initially attempts to load any pre-existing indexes if they were previously computed. If not, it uses the .persist() method to store the indexes. As indicated by the output, the index is generated. If you execute this code block again, it will retrieve the stored checkpoint instead of reprocessing and regenerating indexes.

from llama_index.core import StorageContext, load_index_from_storage
try:
  # Try to load the index if it is already calculated
  storage_context = StorageContext.from_defaults( persist_dir="storage/webtext" )
  webtext_index = load_index_from_storage(storage_context)
  print("Loaded the pre-computed index.")
except:
  # Otherwise, generate the indexes
  webtext_index = VectorStoreIndex.from_documents(webtext_docs)
  webtext_index.storage_context.persist(persist_dir="storage/webtext")
  print("Generated the index.")
The sample code.
Generated the index.
The output.

With data acquired from two distinct sources, let's utilize the query engine and its tools to develop an agent capable of integrating this information.

Step 2: Query Engine

Once the index is established, the query engine used for searching and retrieving data from the index can be efficiently set up.

tesla_engine = tesla_index.as_query_engine(similarity_top_k=3)
webtext_engine = webtext_index.as_query_engine(similarity_top_k=3)
The sample code.

The similarity parameter top_k=3 is set to 3 for the search, which means to return the top 3 most similar results for a given query. As previously mentioned, the query engine tool comprises two distinct data sources.

  1. The tesla_engine variable handles queries about general information.
  2. The webtext_engine variable processes biographical data, focusing on inputs with factual content.

This separation of data types ensures data quality when querying instead of always fetching from both sources with equal weight. With the query engine now constructed, the tools can be configured.

We can use a combination of the QueryEngineTool class to create a new tool that includes a query engine and the ToolMetaData class, which assists in assigning names and descriptions to the tools. These descriptions will help the agent determine the most suitable data source based on the user's query. We will create a list of two tools, each representing one of our data sources.

from llama_index.core.tools import QueryEngineTool, ToolMetadata

query_engine_tools = [
    QueryEngineTool(
        query_engine=tesla_engine,
        metadata=ToolMetadata(
            name="tesla_1k",
            description=(
                "Provides information about Tesla's statements that refers to future times and predictions. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=webtext_engine,
        metadata=ToolMetadata(
            name="webtext_1k",
            description=(
                "Provides information about tesla's life and biographical data. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]
The sample code.

Here’s a clear schematic representation of our current system. The query engine is depicted at the top, signifying its role as the primary tool orchestrating everything. It is positioned centrally between the data sources and the process of formulating the final answer. It acts as a bridge between the proposed questions and their respective answers.

After establishing the basic RAG mechanism with LlamaIndex, the next step is integrating an agent. This addition enables easy testing of the retrieval system. We can then add system design improvements and feature enhancements once the core functionality has been tested and verified.

Step 3: The Agent

Now, let’s set up our agent. In this case, it will be the OpenAI agent. Integrating the query engine tools into the OpenAIAgent module from LlamaIndex enables the agent to execute queries. Setting the verbose argument to True is excellent for debugging. It will allow us to investigate which tool the agent is using and the intermediate steps. You could set the argument to False to only receive the final output.

from llama_index.agent.openai import OpenAIAgent
agent = OpenAIAgent.from_tools(query_engine_tools, verbose=True)
The sample code.

And that’s it! Now that we have our agent, we can execute an interactive chat interface (REPL, Read-Eval-Print Loop) where the agent can receive inputs (like questions or prompts), process them, and return responses, making it a conversational agent capable of handling a dialogue or chat session.

agent.chat_repl()
The sample code.
===== Entering Chat REPL =====
Type "exit" to exit.

Human: What influenced Nikola Tesla to become an inventor?
STARTING TURN 1
---------------

=== Calling Function ===
Calling function: webtext_1k with args: {
"input": "What influenced Nikola Tesla to become an inventor?"
}
Got output: Nikola Tesla was influenced to become an inventor by his studies of mechanical vibrations. He observed the selective response of objects to vibrations and realized the potential for producing effects of tremendous magnitude on physical objects. This led him to pursue research in the field of high-frequency and high-potential currents, which eventually resulted in his groundbreaking inventions.
========================

STARTING TURN 2
---------------

Assistant: Nikola Tesla was influenced to become an inventor by his studies of mechanical vibrations. He observed the selective response of objects to vibrations and realized the potential for producing effects of tremendous magnitude on physical objects. This led him to pursue research in the field of high-frequency and high-potential currents, which eventually resulted in his groundbreaking inventions.

Human: exit
The output.
💡
To debug tools in development, a practical approach involves querying the agent about its tools. This process includes asking the agent to
  • detail the tools at its disposal,
  • the arguments these tools accept, the significance of these arguments,
  • and the intended use of each tool.

We can then analyze the agent's responses to identify prompt deficiencies or understand why the agent might struggle to utilize a tool under development effectively.

Agents with Custom Function

We explored the potential of creating Query Engine tools to enhance an OpenAI-based agent with additional data sources. We observed the capability of agents to select the appropriate tool based on the user's prompt. This decision-making ability can be applied across a broad range of applications.

For instance, one area where Large Language Models typically fall short is mathematical operations. A basic addition or subtraction equation, which may seem straightforward to many, can be challenging for these models. A practical solution to this issue is to equip the models with tools like a calculator for use as needed. This section will create a custom function that a chatbot can access for essential multiplication or addition calculations whenever required.

Initially, we must define a custom function tailored to each task. These custom functions can accept an arbitrary number of inputs and generate an output. Their capabilities can range from a simple addition operation, as in our example, to more complex tasks such as conducting web searches, querying other Large Language Models, or utilizing data from external APIs to answer a question.

def multiply(a: int, b: int) -> int:
    """Multiply two integers and returns the result integer"""
    return a * b

def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b

from llama_index.core.tools import FunctionTool

multiply_tool = FunctionTool.from_defaults(fn=multiply, name="multiply")
add_tool = FunctionTool.from_defaults(fn=add, name="add")

all_tools = [multiply_tool, add_tool]
The sample code.

The above code establishes two functions, titled 'add' and 'multiply'. It is crucial in this setup to specify data types for the input arguments (a:int, b:int), the return type of the function (->int), and a concise explanation of the function's purpose, provided within the triple quotes beneath the function name. These details will be used by the FunctionTool class’s .from_defaults() method to form a description of the function, which can then be used by the agent. The final variable holds a list of all the available tools.

These tools can be used to construct an ObjectIndex, which is a wrapper class linking a VectorStoreIndex with multiple possible tools. Initially, it's necessary to utilize the SimpleToolNodeMapping tool to transform the tool implementations into nodes and then tie everything together.

from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex, SimpleToolNodeMapping

tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
obj_index = ObjectIndex.from_objects(
    all_tools,
    tool_mapping,
    VectorStoreIndex,
)
The sample code.

Note that we do not incorporate any data source in this implementation. This approach is intentional, as we aim to enhance the capabilities of Large Language Models with additional tools. In the next code block, you will see that we are utilizing the defined object index as a retriever! This implies that within the LlamaIndex framework, the custom functions are treated as additional data sources. So, we describe the agent object using the FnRetrieverOpenAIAgent class.

%pip install llama-index-agent-openai-legacy
from llama_index.agent.openai_legacy import FnRetrieverOpenAIAgent

agent = FnRetrieverOpenAIAgent.from_retriever(
    obj_index.as_retriever(), verbose=True
)
The sample code.

Ultimately, we can employ the agent to ask questions, and the agent utilizes the multiply function to provide answers.

agent.chat("What's 12 multiplied by 22? Make sure to use Tools")
The sample code.
STARTING TURN 1
---------------

=== Calling Function ===
Calling function: multiply with args: {
  "a": 12,
  "b": 22
}
Got output: 264
========================

STARTING TURN 2
---------------

AgentChatResponse(response='12 multiplied by 22 is 264.', sources=[ToolOutput(content='264', tool_name='multiply', raw_input={'args': (), 'kwargs': {'a': 12, 'b': 22}}, raw_output=264)], source_nodes=[])
The output.

In the previous example, we specified in the prompt that the agent should utilize the tools. Additionally, it's possible to employ the tool_choice argument to explicitly direct the agent to use specific tools or to use the auto keyword to let the agent decide.

response = agent.chat( "What is 5 + 2?", tool_choice="add" )
The sample code.
STARTING TURN 1
---------------

=== Calling Function ===
Calling function: add with args: {
  "a": 5,
  "b": 2
}
Got output: 7
========================

STARTING TURN 2
---------------

AgentChatResponse(response='5 + 2 is equal to 7.', sources=[ToolOutput(content='7', tool_name='add', raw_input={'args': (), 'kwargs': {'a': 5, 'b': 2}}, raw_output=7)], source_nodes=[])
The output.

Agents from LlamaHub

Agents can offer a broad range of functionalities, significantly extending the capabilities of Large Language Models into unexplored realms. LlamaHub streamlines the curation, sharing, and usage of more than 30 agents, achievable with just one line of code. We have already explored its application for scrapping data from Wikipedia in the LlamaIndex Unlocked lesson. To see a complete list of implemented agents, click here.

💡
We have reviewed the fundamentals of agents and tools by examining a selection of widely used agents in LlamaIndex. For more options and details, you can refer to the documentation.

Conclusion

In this lesson, we discussed how to utilize agents to enhance the capabilities of Large Language Models by integrating new tools that unlock their potential. We experimented with employing these agents as decision-making functions to incorporate various data sources in response to user queries. Additionally, we explored their use as reasoning machines, combined with custom functions, to further amplify their abilities. The ability to make function calls is a potent aspect of designing agents, enabling the easy integration of additional information into the model from virtually any imaginable resource.

>> Notebook.

Resources:

the RAG-AGENT example notebook:

  • LlamaHub on GitHub:
  • data agents