Exploring the Fascinating World of Autonomous Agents: A Closer Look at AutoGPT and BabyAGI

Introduction

AutoGPT and BabyAGI are two exciting developments in the world of autonomous agents, which are AI systems designed to carry out tasks without needing constant human guidance. These innovative agents are making waves due to their ability to work independently to achieve a specific objective. Their introduction has led to unreal hype with over 100k stars on GitHub, and they've been heralded as a peek into the future.

AutoGPT, an open-source initiative, employs GPT-4 to sift through the internet in a structured manner, formulate subtasks, and initiate new agents. This project has quickly become a sensation, marked by its rapid growth in popularity among the GitHub community. On the other hand, BabyAGI functions similarly via the integration of GPT-4, a vector store, and LangChain. It creates tasks based on prior outcomes and a set goal.

While Auto GPT and similar technologies are rapidly evolving, developers are also building and improving on these models. The intrigue surrounding these autonomous agents stems from a few key factors:

  • Limited human involvement: Unlike traditional systems like ChatGPT that require human prompts, autonomous agents such as AutoGPT and BabyAGI require minimal human intervention.
  • Diverse applications: The potential use cases for these autonomous agents are vast, spanning from personal assistants and problem solvers to automated aids for tasks like email management and prospecting.
  • Swift progress: The rapid pace of growth and interest in these projects highlights the significant potential of autonomous agents to revolutionize the AI landscape and beyond.

To effectively utilize these agents, we need to start by setting long-term goals tailored to the project's specific needs. These goals might encompass generating high-quality natural language text, answering questions with accuracy and context, and learning from user interactions for continuous performance improvement.

What is AutoGPT?

AutoGPT, a type of autonomous AI agent, is designed to carry out tasks until they are solved.

It brings three key features to the table:

  • Firstly, it's connected to the internet, allowing for real-time research and information retrieval.
  • Secondly, it can self-prompt, generating a list of sub-tasks to accomplish a given task.
  • Lastly, it can execute tasks, including spinning up other AI agents. While the first two features have been successful, the execution aspect has met with some challenges, including getting caught in loops or wrongly assuming a task has been completed.

The initial conception of AutoGPT was as a general autonomous agent capable of doing anything. However, this wide breadth of application seemed to dilute its effectiveness. As a result, a shift has been observed in the AutoGPT space, with developers starting to build specialized agents. These agents are designed to perform specific tasks effectively and efficiently, making them more practically useful.

How AutoGPT work?

The concept behind AutoGPT is simple yet profound. Rather than only generating text in response to prompts like plain ChatGPT and GPT-4, AutoGPT is designed to generate, prioritize, and execute tasks. These tasks can range in complexity and are not confined to mere text generation.

AutoGPT can understand the overall goal, break it down into subtasks, execute those tasks, and dynamically adjust its actions based on the ongoing context.

AutoGPT uses plugins for internet browsing and other forms of access to gather necessary information. The outside memory serves as its context-aware module, enabling it to evaluate its current situation, generate new tasks, self-correct if needed, and add new tasks to its queue. This allows for a dynamic flow of operations where tasks are executed and constantly reprioritized based on the context and situation. This understanding of the task, the environment, and the goal at each point in the process transforms AutoGPT from a passive text generator into an active, goal-oriented agent.

While this could open up new vistas of AI-powered productivity and problem-solving, it also ushers in new challenges regarding control, misuse, and unforeseen consequences.

What is BabyAGI?

Baby AGI works similarly to autoGPT. It operates in an infinite loop, pulling tasks from a list, executing them, enriching the results, and creating new tasks based on the previous task's objective and results. The concept is similar, but the specific implementation is different. Let’s see it in more detail.

How BabyAGI works

BabyAGI operates in a loop that revolves around four main sub-agents: the Execution Agent, the Task Creation Agent, the Prioritization Agent, and the Context Agent.

  1. Execution Agent: This is the agent that executes the tasks. It takes an objective and a task as parameters, constructs a prompt based on these inputs, and feeds it to a LLM (e.g. GPT4). The LLM then returns a result, which is the outcome of executing the task.
  2. Task Creation Agent: Here, the system creates new tasks based on the previously executed task's objective and result. The agent uses a prompt that includes the task description and the current task list and feeds this prompt to the LLM, which generates a list of new tasks. These tasks are returned as a list of dictionaries, each dictionary representing a new task.
  3. Prioritization Agent: This function is responsible for prioritizing the tasks in the tasks list.
  4. Context Agent: The scope of this agent is to collect the results from the Execution Agent and merge them with all the other intermediate results from the previous executions of the Execution Agent.
                                            Image Credit:

We can conclude the following about BabyAGI

  1. BabyAGI is an autonomous AI agent designed to execute tasks, generate new tasks based on previous task results, and re-prioritize tasks in real time. This showcases the potential of AI-powered language models to perform tasks autonomously within various constraints and contexts.
  2. The system utilizes the power of GPT-4 for task execution, a vector database for efficient search and storage of task-related data, and the LangChain framework to enhance the decision-making processes. The integration of these technologies allows BabyAGI to interact with its environment and perform tasks efficiently.
  3. A key feature of the system is its task management. BabyAGI maintains a task list for managing and prioritizing tasks. The system autonomously generates new tasks based on completed results and dynamically re-prioritizes the task list, highlighting the adaptability of AI-powered language models.
  4. By using GPT-4 and LangChain's capabilities, BabyAGI cannot only complete tasks but also enrich and store results in the database. The agent thus becomes a learning system that can adapt and respond to new information and priorities.

A Code Example of Using BabyAGI

Although BabyAGI uses specific vector stores and model providers, one of the benefits of implementing it with LangChain is that you can easily swap those out for different options. In this implementation, we use a FAISS vector store.

💡
Langchain has recently (as of August 2023) moved some classes from "langchain.experimental" to another library called "library_experimental", in an attempt to make the "langchain" library smaller. If you try the following code with the suggested version “langchain==0.0.208” it should work fine, but if you want to run it with the latest langchain version then you have to (1) install the experimental library with “pip install langchain-experimental” and (2) replace all the occurrences of “langchain.experimental” with “langchain_experimental”.

Let’s set up the API keys as environment variables as always.

import os
os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"

We then create a vector store. Depending on what vector store you use, this step may look different. To proceed, please install either the faiss-gpu or faiss-cpu library. While we recommend utilizing the latest version of libraries, it is important to note that the codes have been tested using version 1.7.2. Remember to install the other required packages with the following command: pip install langchain==0.1.4 deeplake==3.9.27 openai==1.10.0 tiktoken langchain_experimental. If you get the error Could not find a version that satisfies the requirement faiss (from versions: none) install the faiss library pip install faiss-cpu

from langchain.embeddings import OpenAIEmbeddings
import faiss
from langchain.vectorstores import FAISS
from langchain.docstore import InMemoryDocstore

# Define the embedding model
embeddings_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Initialize the vectorstore
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})
from langchain import OpenAI
from langchain_experimental.autonomous_agents import BabyAGI

# set the goal
goal = "Plan a trip to the Grand Canyon"

# create thebabyagi agent
# If max_iterations is None, the agent may go on forever if stuck in loops
baby_agi = BabyAGI.from_llm(
    llm=OpenAI(model="gpt-3.5-turbo-instruct", temperature=0),
    vectorstore=vectorstore,
    verbose=False,
    max_iterations=3
)
response = baby_agi({"objective": goal})

You should see something like the following printed output.

****TASK LIST*****

1: Make a todo list

*****NEXT TASK*****

1: Make a todo list

*****TASK RESULT*****

1. Research the best time to visit the Grand Canyon
2. Book flights to the Grand Canyon
3. Book a hotel near the Grand Canyon
4. Research the best activities to do at the Grand Canyon
5. Make a list of items to pack for the trip
6. Make a budget for the trip
7. Make a list of places to eat near the Grand Canyon
8. Make a list of souvenirs to buy at the Grand Canyon
9. Make a list of places to visit near the Grand Canyon
10. Make a list of emergency contacts to have on hand during the trip

*****TASK LIST*****

2: Research the best way to get to the Grand Canyon from the airport
3: Research the best way to get around the Grand Canyon
4: Research the best places to take pictures at the Grand Canyon
5: Research the best places to take hikes at the Grand Canyon
6: Research the best places to view wildlife at the Grand Canyon
7: Research the best places to camp at the Grand Canyon
8: Research the best places to stargaze at the Grand Canyon
9: Research the best places to take a tour at the Grand Canyon
10: Research the best places to buy souvenirs at the Grand Canyon
11: Research the cost of activities at the Grand Canyon

*****NEXT TASK*****

2: Research the best way to get to the Grand Canyon from the airport

*****TASK RESULT*****

I will research the best way to get to the Grand Canyon from the airport. I will look into the different transportation options available, such as car rental, public transportation, and shuttle services. I will also compare the cost and convenience of each option. Additionally, I will research the best routes to take to get to the Grand Canyon from the airport.

*****TASK LIST*****

3: Research the best activities to do at the Grand Canyon
4: Research the best places to take pictures at the Grand Canyon
5: Research the best places to take hikes at the Grand Canyon
6: Research the best places to view wildlife at the Grand Canyon
7: Research the best places to camp at the Grand Canyon
8: Research the best places to stargaze at the Grand Canyon
9: Research the best places to take a tour at the Grand Canyon
10: Research the best places to buy souvenirs at the Grand Canyon
11: Research the cost of activities at the Grand Canyon
12: Research the best restaurants near the Grand Canyon
13: Research the best hotels near the Grand Canyon
14: Research the best way to get around the Grand Canyon
15: Research the best places to take a break from the heat at the Grand Canyon
16: Research the best places to take a break from the crowds at the Grand Canyon
17: Research the best places to take a break from the sun at the Grand Canyon
18: Research the best places to take a break from the wind at the Grand Canyon
19: Research the best places

*****NEXT TASK*****

3: Research the best activities to do at the Grand Canyon

*****TASK RESULT*****

To help you plan the best activities to do at the Grand Canyon, here are some suggestions:
1. Take a guided tour of the Grand Canyon. There are a variety of guided tours available, from helicopter tours to mule rides.
2. Hike the trails. There are a variety of trails to explore, from easy to difficult.
3. Visit the Grand Canyon Skywalk. This is a glass bridge that extends 70 feet over the edge of the canyon.
4. Take a rafting trip down the Colorado River. This is a great way to experience the canyon from a different perspective.
5. Visit the Grand Canyon Village. This is a great place to explore the history of the canyon and learn more about the area.
6. Take a scenic drive. There are a variety of scenic drives that offer stunning views of the canyon.
7. Go camping. There are a variety of camping sites available in the area, from primitive to RV sites.
8. Take a helicopter tour. This is a great way to get an aerial view of the canyon.
9. Visit the Desert View Watchtower. This is a great place to get a panoramic view of the canyon

*****TASK ENDING*****

This output reflects the systematic way in which the BabyAGI model approaches tasks.

It begins by outlining the tasks, making a to-do list regarding a trip to the Grand Canyon, then it proceeds to complete each task one by one.

For each task, it not only lists out the information gained through research but also offers a plan of action or what steps it would take to accomplish the task.

The agent also dynamically updates its task list based on new information or steps necessary to accomplish broader tasks, like researching the best ways to get to the Grand Canyon, then breaking it down into more specific sub-tasks. This sequential, methodical approach underscores BabyAGI's ability to handle multi-step tasks in an organized manner.

Future Possibilities

The future possibilities for AI agents like BabyAGI and AutoGPT are truly exciting, based on the potential improvements and applications.

As for the current status, each autonomous agent has its strengths and challenges: AutoGPT is powerful for complex tasks, though it has a steeper learning curve. BabyAGI excels at providing detailed task lists toward a goal, though it does face implementation hurdles. They both sometimes fall short in executing tasks, but these agents are improving every day with the effort of the open-source community.

These AI agents are already showing how they can navigate tasks and problems with autonomy that was previously the domain of human intellect.

In the next lesson we’ll use AutoGPT with LangChain and explain more about how it works.

RESOURCES:

Inspired projects: