Building Autonomous Agents to Create Analysis Reports

Introduction

In this lesson, our aim is to create an autonomous agent using the LangChain framework. We will explore the concept of "Plan and Execute" LangChain agents and their ability to generate insightful analysis reports based on retrieved documents from Deep Lake.

We will start by understanding the fundamentals of the "Plan and Execute" LangChain agent framework and its benefits for complex long-term planning. Then, we will delve into our project's implementation details and workflow.

By the end of the lesson, you will have a solid understanding of building autonomous agents using the LangChain framework and be equipped with the skills to create analysis reports using them.

Workflow

This is the workflow we’ll follow in this project:

Saving Documents on Deep Lake: We will begin by learning how to save documents on Deep Lake, which serves as our knowledge repository. Deep Lake provides information that our agents can leverage for analysis and report generation.
Creating a Document Retrieval Tool: Next, we will develop a tool that enables our agent to retrieve the most relevant documents from Deep Lake based on a given query.
Using the Plan and Execute Agent: The core of our project involves employing a "Plan and Execute" agent to devise a plan for answering a specific query about creating an overview of a topic. Our specific objective is to generate a comprehensive outline of recent events related to Artificial Intelligence regulations by governments, but the final agent could also work for other similar objectives as well.

To accomplish this, we will feed the query into the planner component of the agent, which will utilize a language model's reasoning ability to plan out the steps required. The planner will consider various factors, including the complexity of the query and instructions for the tool used to generate a step-by-step plan or lower-level queries.

The plan will then be passed to the executor component, which will determine the appropriate tools or actions required to execute each step of the plan. The executor, initially implemented as an Action Agent, will make use of the tools we developed earlier, such as the document retrieval tool, to gather relevant information and execute the plan.

By employing the "Plan and Execute" agent framework, we can achieve more accurate and reliable analysis reports while handling complex long-term planning scenarios.

So let's dive in and explore the potential for generating insightful analysis reports!

Plan and Execute

Plan and Execute agents are a new type of agent executor offering a different approach than the traditional agents supported in LangChain. These agents are heavily inspired by the BabyAGI framework and the recent Plan-and-Solve paper. The primary goal of "Plan and Execute" agents is to enable more complex long-term planning, even at the cost of making more calls to the language model.

The planner in the "Plan-and-Execute" framework typically utilizes a language model's reasoning ability to plan out steps and handle ambiguity and edge cases.
The executor, initially an Action Agent, takes the planner's high-level objectives (steps) and determines the tools or actions required to accomplish each step.

This separation of planning and execution allows for improved reliability and flexibility. It also facilitates the possibility of replacing these components with smaller, fine-tuned models in the future.

We will explore the implementation of the "Plan and Execute" agent and how to integrate it with Deep Lake for document retrieval and see the agent in action as it generates an analysis report based on the given query.

Implementation

Let’s set up the OpenAI API and Activeloop keys in environment variables.

import os

os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"
os.environ["ACTIVELOOP_TOKEN"] = "<YOUR-ACTIVELOOP-TOKEN>"

💡

Langchain has recently (as of August 2023) moved some classes from "langchain.experimental" to another library called "library_experimental", in an attempt to make the "langchain" library smaller. If you try the following code with the suggested version “langchain==0.0.208” it should work fine, but if you want to run it with the latest langchain version then you have to (1) install the experimental library with “pip install langchain-experimental” and (2) replace all the occurrences of “langchain.experimental” with “langchain_experimental”.

We then use the requests library to send HTTP requests and the newspaper library for article parsing. By iterating over a list of article URLs, the code downloads the HTML of each webpage, extracts the article text, and stores it along with the corresponding URL. We could also load our private files on Deep Lake, but for this project's scope, we’ll upload content downloaded from public web pages.

# We scrape several Artificial Intelligence news

import requests
from newspaper import Article # https://github.com/codelucas/newspaper
import time

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}

article_urls = [
    "https://www.artificialintelligence-news.com/2023/05/23/meta-open-source-speech-ai-models-support-over-1100-languages/",
    "https://www.artificialintelligence-news.com/2023/05/18/beijing-launches-campaign-against-ai-generated-misinformation/"
    "https://www.artificialintelligence-news.com/2023/05/16/openai-ceo-ai-regulation-is-essential/",
    "https://www.artificialintelligence-news.com/2023/05/15/jay-migliaccio-ibm-watson-on-leveraging-ai-to-improve-productivity/",
    "https://www.artificialintelligence-news.com/2023/05/15/iurii-milovanov-softserve-how-ai-ml-is-helping-boost-innovation-and-personalisation/",
    "https://www.artificialintelligence-news.com/2023/05/11/ai-and-big-data-expo-north-america-begins-in-less-than-one-week/",
    "https://www.artificialintelligence-news.com/2023/05/11/eu-committees-green-light-ai-act/",
    "https://www.artificialintelligence-news.com/2023/05/09/wozniak-warns-ai-will-power-next-gen-scams/",
    "https://www.artificialintelligence-news.com/2023/05/09/infocepts-ceo-shashank-garg-on-the-da-market-shifts-and-impact-of-ai-on-data-analytics/",
    "https://www.artificialintelligence-news.com/2023/05/02/ai-godfather-warns-dangers-and-quits-google/",
    "https://www.artificialintelligence-news.com/2023/04/28/palantir-demos-how-ai-can-used-military/",
    "https://www.artificialintelligence-news.com/2023/04/26/ftc-chairwoman-no-ai-exemption-to-existing-laws/",
    "https://www.artificialintelligence-news.com/2023/04/24/bill-gates-ai-teaching-kids-literacy-within-18-months/",
    "https://www.artificialintelligence-news.com/2023/04/21/google-creates-new-ai-division-to-challenge-openai/"
]

session = requests.Session()
pages_content = [] # where we save the scraped articles

for url in article_urls:
    try:
        time.sleep(2) # sleep two seconds for gentle scraping
        response = session.get(url, headers=headers, timeout=10)

        if response.status_code == 200:
            article = Article(url)
            article.download() # download HTML of webpage
            article.parse() # parse HTML to extract the article text
            pages_content.append({ "url": url, "text": article.text })
        else:
            print(f"Failed to fetch article at {url}")
    except Exception as e:
        print(f"Error occurred while fetching article at {url}: {e}")

#If an error occurs while fetching an article, we catch the exception and print
#an error message. This ensures that even if one article fails to download,
#the rest of the articles can still be processed.

Then, we import the OpenAIEmbeddings class, which will be used to compute embeddings for our documents. We also import the Deep Lake class from the langchain.vectorstores module will serve as the storage for our documents and their embeddings.

By setting up the Deep Lake instance with a specified dataset path and the embedding_function parameter set to the OpenAIEmbeddings instance, we establish a connection to Deep Lake and configure it to use the specified embedding model for computing document embeddings. Remember to install the required packages with the following command: pip install langchain==0.0.208 deeplake==3.9.27 openai==0.27.8 tiktoken.

# We'll use an embedding model to compute our documents' embeddings
from langchain.embeddings.openai import OpenAIEmbeddings

# We'll store the documents and their embeddings in the deep lake vector db
from langchain.vectorstores import DeepLake

# Setup deep lake
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

# create Deep Lake dataset
# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = "<YOUR-ACTIVELOOP-ORG-ID>"
my_activeloop_dataset_name = "langchain_course_analysis_outline"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

Next, we create an instance of RecursiveCharacterTextSplitter with specified chunk_size and chunk_overlap parameters. Then, we iterated over the pages_content and use the split_text method of the text_splitter to split each article text into chunks. These chunks are then appended to the all_texts list, resulting in a collection of smaller text chunks derived from the original articles.

# We split the article texts into small chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

all_texts = []
for d in pages_content:
    chunks = text_splitter.split_text(d["text"])
    for chunk in chunks:
        all_texts.append(chunk)

Now we can add those chunks to the Deep Lake database.

# we add all the chunks to the Deep lake
db.add_texts(all_texts)

We are done with setting up the Deep Lake dataset with our documents! Let’s now focus on building the “Plan and Execute” agent that will leverage our dataset. Now, we can set up our Plan and Execute agent. Let’s create a retriever from the Deep Lake dataset and a function for our custom tool that retrieves the most similar documents to a query from the dataset.

# Get the retriever object from the deep lake db object and set the number
# of retrieved documents to 3
retriever = db.as_retriever()
retriever.search_kwargs['k'] = 3

# We define some variables that will be used inside our custom tool
CUSTOM_TOOL_DOCS_SEPARATOR ="\n---------------\n" # how to join together the retrieved docs to form a single string

# This is the function that defines our custom tool that retrieves relevant
# docs from Deep Lake
def retrieve_n_docs_tool(query: str) -> str:
    """Searches for relevant documents that may contain the answer to the query."""
    docs = retriever.get_relevant_documents(query)
    texts = [doc.page_content for doc in docs]
    texts_merged = "---------------\n" + CUSTOM_TOOL_DOCS_SEPARATOR.join(texts) + "\n---------------"
    return texts_merged

We retrieve the retriever object from the Deep Lake database and set the number of retrieved documents to 3. This is important for the plan and execution agent because it allows us to retrieve a specific number of relevant documents from Deep Lake based on a given query.

Also, we defined a custom tool function called retrieve_n_docs_tool that takes a query as input and uses the retriever to search for relevant documents containing the answer to the query.

The retrieved document texts are then merged using the CUSTOM_TOOL_DOCS_SEPARATOR variable, representing the separator string used to join the documents into a single string. The merged text is returned as the output of the custom tool function.

This functionality enables the plan and execution agent to retrieve and process relevant documents for further analysis and decision-making.

from langchain.agents.tools import Tool

# We create the tool that uses the "retrieve_n_docs_tool" function
tools = [
    Tool(
        name="Search Private Docs",
        func=retrieve_n_docs_tool,
        description="useful for when you need to answer questions about current events about Artificial Intelligence"
    )
]

The tool is named "Search Private Docs," and its functionality is based on the retrieve_n_docs_tool function. The purpose of this tool is to provide a way to search for and retrieve relevant documents from Deep Lake in order to answer questions about current events related to Artificial Intelligence. The tool is described as being useful in situations where there is a need to gather information and insights from private documents.

We are now ready to create the agent!

from langchain.chat_models import ChatOpenAI
from langchain.experimental.plan_and_execute import PlanAndExecute, load_agent_executor, load_chat_planner

# let's create the Plan and Execute agent
model = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
planner = load_chat_planner(model)
executor = load_agent_executor(model, tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor, verbose=True)

The agent consists of two components: a planner and an executor. The planner is responsible for generating a plan based on the given input, and the executor executes the plan by interacting with the tools and external systems. The agent is set to be verbose, which means it will provide detailed information and logs during its operation.

# we test the agent
response = agent.run("Write an overview of Artificial Intelligence regulations by governments by country")

You should see something like the following output. Here we split it into multiple sections and comment on them individually, keeping only the most relevant ones.

> Entering new PlanAndExecute chain...
steps=[Step(value='Research the current state of Artificial Intelligence (AI) regulations in various countries.'), Step(value='Identify the key countries with significant AI regulations or ongoing discussions about AI regulations.'), Step(value='Summarize the AI regulations or discussions in each identified country.'), Step(value='Organize the information by country, providing an overview of the AI regulations in each.'), Step(value='Given the above steps taken, provide an overview of Artificial Intelligence regulations by governments by country.\n')]

At first, the planning agent creates a plan for our query with multiple steps. Each step is a query that the action agent will be asked to give an answer to. Here are the identified steps.

Research the current state of Artificial Intelligence (AI) regulations in various countries.
Identify the key countries with significant AI regulations or ongoing discussions about AI regulations.
Summarize the AI regulations or discussions in each identified country.
Organize the information by country, providing an overview of each AI regulation.
Given the above steps taken, provide an overview of Artificial Intelligence regulations by governments by country.

Let’s see how the output continues.

> Entering new AgentExecutor chain...Action:
```
{
  "action": "Search Private Docs",
  "action_input": "current state of Artificial Intelligence regulations in various countries"
}
```
Observation: ---------------
“US-based AI developers will likely steal a march on their European competitors given the news that the EU parliamentary committees have green-lit its groundbreaking AI Act, where AI systems will need to be categorized according to their potential for harm from the outset. The US tech approach is typically to experiment first and, once market and product fit is established, to retrofit to other markets and their regulatory framework. This approach fosters innovation whereas EU-based AI developers will need to take note of the new rules and develop systems and processes which may take the edge off their ability to innovate. The UK is adopting a similar approach to the US, although the proximity of the EU market means that UK-based developers are more likely to fall into step with the EU ruleset from the outset. However, the potential to experiment in a safe space – a regulatory sandbox – may prove very attractive.”
---------------
To boost AI innovation, MEPs added exemptions to these rules for research activities and AI components provided under open-source licenses. The new law also promotes regulatory sandboxes – or controlled environments established by public authorities – to test AI before its deployment.

MEPs want to boost citizens’ right to file complaints about AI systems and receive explanations of decisions based on high-risk AI systems that significantly impact their rights. MEPs also reformed the role of the EU AI Office, which would be tasked with monitoring how the AI rulebook is implemented.

Tim Wright, Tech and AI Regulatory Partner at London-based law firm Fladgate commented:
---------------
Ryan is a senior editor at TechForge Media with over a decade of experience covering the latest technology and interviewing leading industry figures. He can often be sighted at tech conferences with a strong coffee in one hand and a laptop in the other. If it's geeky, he’s probably into it. Find him on Twitter (@Gadget_Ry) or Mastodon (@gadgetry@techhub.social)

The Internal Market Committee and the Civil Liberties Committee of the European Parliament have endorsed new transparency and risk-management rules for artificial intelligence systems known as the AI Act.

This marks a major step in the development of AI regulation in Europe, as these are the first-ever rules for AI. The rules aim to ensure that AI systems are safe, transparent, traceable, and non-discriminatory.

After the vote, co-rapporteur Brando Benifei (S&D, Italy) said:
---------------
Thought: I have information about the current state of AI regulations in the EU, US, and UK. I can provide a summary of the AI regulations in these regions.
Action:
```
{
  "action": "Final Answer",
  "action_input": "The current state of AI regulations varies across countries. In the European Union, the AI Act has been endorsed by the Internal Market Committee and the Civil Liberties Committee, marking a major step in AI regulation. The AI Act aims to ensure that AI systems are safe, transparent, traceable, and non-discriminatory. In the United States, AI developers typically experiment first and then retrofit their products to other markets and regulatory frameworks, fostering innovation. The UK is adopting a similar approach to the US, but its proximity to the EU market means that UK-based developers are more likely to align with the EU ruleset from the outset. Regulatory sandboxes, or controlled environments established by public authorities, are being promoted to test AI before deployment."
}
```> Finished chain.
*****

In this step, we see the executor agent searching for relevant documents with our Deep Lake tool and showing the top 3 retrieved chunks. After analyzing them, the agent decides to return the final answer to the question, which is:

The current state of AI regulations varies across countries. In the European Union, the AI Act has been endorsed by the Internal Market Committee and the Civil Liberties Committee, marking a major step in AI regulation. The AI Act aims to ensure that AI systems are safe, transparent, traceable, and non-discriminatory. In the United States, AI developers typically experiment first and then retrofit their products to other markets and regulatory frameworks, fostering innovation. The UK is adopting a similar approach to the US, but its proximity to the EU market means that UK-based developers are more likely to align with the EU ruleset from the outset. Regulatory sandboxes, or controlled environments established by public authorities, are being promoted to test AI before deployment.

Here we omit the outputs of all the other steps done by the executor agent.

Let’s print the final response from the agent.

print(response)

European Union: The AI Act has been endorsed by the Internal Market Committee and the Civil Liberties Committee, aiming to ensure AI systems are safe, transparent, traceable, and non-discriminatory.

United States: AI developers typically experiment first and then retrofit their products to other markets and regulatory frameworks, fostering innovation.

United Kingdom: The UK is adopting a similar approach to the US, but its proximity to the EU market means that UK-based developers are more likely to align with the EU ruleset from the outset. Regulatory sandboxes are being promoted to test AI before deployment.

We see that the agent has been able to iteratively create an overview of AI regulations by diverse documents, leveraging several documents.

Conclusion

The experiment involving the Plan and Execute agent has been successful in providing a comprehensive overview of Artificial Intelligence regulations by governments, specifically by finding information about the European Union, United States, and United Kingdom. The agent effectively performed various steps, including researching the current state of AI regulations, identifying key countries, summarizing regulations, and organizing the information by country.

The output generated by the agent demonstrates its ability to understand and interpret complex information about AI regulations. It accurately summarizes the AI regulations in each country, highlighting the endorsement of the AI Act in the European Union to ensure the safety, transparency, traceability, and non-discrimination of AI systems.

The agent successfully executes its plan by retrieving relevant information, summarizing it, and providing a concise and informative overview. It demonstrates its capability to gather insights from multiple sources and deliver a coherent response. The agent's performance in this experiment highlights its potential to assist with research, generate informative summaries, and provide valuable insights.

In the next lesson we’ll learn about recent developments and trends about LLM-based agents.

RESOURCES:

Plan-and-Execute Agents

TL;DR: We’re introducing a new type of agent executor, which we’re calling “Plan-and-Execute”. This is to contrast against the previous types of agent we supported, which we’re calling “Action” agents. Plan-and-Execute agents are heavily inspired by BabyAGI and the recent Plan-and-Solve paper. We think Plan-and-Execute is

blog.langchain.dev

Deep Lake | 🦜️🔗 Langchain

Deep Lake as a Multi-Modal Vector Store that stores embeddings and their metadata including text, jsons, images, audio, video, and more. It saves the data locally, in your cloud, or on Activeloop storage. It performs hybrid search including embeddings and their attributes.

python.langchain.com