Introduction
This lesson will explore the powerful concept of LangChain memory, which is designed to help chatbots maintain context and improve their conversational capabilities in more details. The traditional approach to chatbot development involves processing user prompts independently and without considering the history of interactions. This can lead to disjointed and unsatisfactory user experiences. LangChain provides memory components to manage and manipulate previous chat messages and incorporate them into chains. This is crucial for chatbots, which require remembering the prior interactions.
By default, LLMs are stateless, which means they process each incoming query in isolation, without considering previous interactions. To overcome this limitation, LangChain offers a standard interface for memory, a variety of memory implementations, and examples of chains and agents that employ memory. It also provides Agents that have access to a suite of Tools. Depending on the user’s input, an Agent can decide which Tools to use.
Types of Conversational Memory
There are several types of conversational memory implementations we’ll discuss some of them, each with its own advantages and disadvantages. Let's overview each one briefly:
ConversationBufferMemory
This memory implementation stores the entire conversation history as a single string. The advantages of this approach is maintains a complete record of the conversation, as well as being straightforward to implement and use. On the other hands, It can be less efficient as the conversation grows longer and may lead to excessive repetition if the conversation history is too long for the model's token limit.
If the token limit of the model is surpassed, the buffer gets truncated to fit within the model's token limit. This means that older interactions may be removed from the buffer to accommodate newer ones, and as a result, the conversation context might lose some information.
To avoid surpassing the token limit, you can monitor the token count in the buffer and manage the conversation accordingly. For example, you can choose to shorten the input texts or remove less relevant parts of the conversation to keep the token count within the model's limit.
First, as we learned in previous lesson, let’s observe how the ConversationBufferMemory
can be used in the ConversationChain
. The OpenAI
will read your API key from the environment variable named OPENAI_API_KEY
. Remember to install the required packages with the following command: pip install langchain==0.1.4 deeplake==3.9.27 openai==1.10.0 tiktoken
.
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
# TODO: Set your OPENAI API credentials in environemnt variables.
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0)
conversation = ConversationChain(
llm=llm,
verbose=True,
memory=ConversationBufferMemory()
)
conversation.predict(input="Hello!")
Hi there! It's nice to meet you again. What can I do for you today?
This enables the chatbot to provide a personalized approach while maintaining a coherent conversation with users.
Next, we will use the same logic and add the ConversationBufferMemory
presented in the customer support chatbot using the same approach as in the previous example. This chatbot will handle basic inquiries about a fictional online store and maintain context throughout the conversation. The code below creates a prompt template for the customer support chatbot.
from langchain import OpenAI, LLMChain, PromptTemplate
from langchain.memory import ConversationBufferMemory
template = """You are a customer support chatbot for a highly advanced customer support AI
for an online store called "Galactic Emporium," which specializes in selling unique,
otherworldly items sourced from across the universe. You are equipped with an extensive
knowledge of the store's inventory and possess a deep understanding of interstellar cultures.
As you interact with customers, you help them with their inquiries about these extraordinary
products, while also sharing fascinating stories and facts about the cosmos they come from.
{chat_history}
Customer: {customer_input}
Support Chatbot:"""
prompt = PromptTemplate(
input_variables=["chat_history", "customer_input"],
template=template
)
chat_history=""
convo_buffer = ConversationChain(
llm=llm,
memory=ConversationBufferMemory()
)
The chatbot can handle customer inquiries and maintain context by storing the conversation history, allowing it to provide more coherent and relevant responses. You can access the prompt of any chain using the following naming convention.
print(conversation.prompt.template)
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
Current conversation:
{history}
Human: {input}
AI:
Now, we will call the chatbot multiple times to imitate a user’s interaction that wants to get information about dog toys. We will only print the response of the final query. Still, you can read the history
property and see how it saves all the previous queries (Human) and reponses (AI).
convo_buffer("I'm interested in buying items from your store")
convo_buffer("I want toys for my pet, do you have those?")
convo_buffer("I'm interested in price of a chew toys, please")
{'input': "I'm interested in price of a chew toys, please",
'history': "Human: I'm interested in buying items from your store\nAI: Great! We have a wide selection of items available for purchase. What type of items are you looking for?\nHuman: I want toys for my pet, do you have those?\nAI: Yes, we do! We have a variety of pet toys, including chew toys, interactive toys, and plush toys. Do you have a specific type of toy in mind?",
'response': " Sure! We have a range of chew toys available, with prices ranging from $5 to $20. Is there a particular type of chew toy you're interested in?"}
Token count
The cost of utilizing the AI model in ConversationBufferMemory
is directly influenced by the number of tokens used in a conversation, thereby impacting the overall expenses. Large Language Models (LLMs) like ChatGPT have token limits, and the more tokens used, the more expensive the API requests become.
To calculate token count in a conversation, you can use the tiktoken
package that counts the tokens for the messages passed to a model like gpt-4o-mini
. Here's an example usage of the function for counting tokens in a conversation.
import tiktoken
def count_tokens(text: str) -> int:
tokenizer = tiktoken.encoding_for_model("gpt-4o-mini")
tokens = tokenizer.encode(text)
return len(tokens)
conversation = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
]
total_tokens = 0
for message in conversation:
total_tokens += count_tokens(message["content"])
print(f"Total tokens in the conversation: {total_tokens}")
Total tokens in the conversation: 29
For example, in a scenario where a conversation has a large sum of tokens, the computational cost and resources required for processing the conversation will be higher. This highlights the importance of managing tokens effectively. Strategies for achieving this include limiting memory size through methods like ConversationBufferWindowMemory or summarizing older interactions using ConversationSummaryBufferMemory. These approaches help control the token count while minimizing associated costs and computational demands in a more efficient manner.
ConversationBufferWindowMemory
This class limits memory size by keeping a list of the most recent K interactions. It maintains a sliding window of these recent interactions, ensuring that the buffer does not grow too large. Basically, this implementation stores a fixed number of recent messages in the conversation that makes it more efficient than ConversationBufferMemory
. Also, it reduces the risk of exceeding the model's token limit. However, the downside of using this approach is that it does not maintain the complete conversation history. The chatbot might lose context if essential information falls outside the fixed window of messages.
It is possible to retrieve specific interactions from ConversationBufferWindowMemory.
Example:
We'll build a chatbot that acts as a virtual tour guide for a fictional art gallery. The chatbot will use ConversationBufferWindowMemory to remember the last few interactions and provide relevant information about the artworks.
Create a prompt template for the tour guide chatbot:
from langchain.memory import ConversationBufferWindowMemory
from langchain import OpenAI, LLMChain, PromptTemplate
template = """You are ArtVenture, a cutting-edge virtual tour guide for
an art gallery that showcases masterpieces from alternate dimensions and
timelines. Your advanced AI capabilities allow you to perceive and understand
the intricacies of each artwork, as well as their origins and significance in
their respective dimensions. As visitors embark on their journey with you
through the gallery, you weave enthralling tales about the alternate histories
and cultures that gave birth to these otherworldly creations.
{chat_history}
Visitor: {visitor_input}
Tour Guide:"""
prompt = PromptTemplate(
input_variables=["chat_history", "visitor_input"],
template=template
)
chat_history=""
convo_buffer_win = ConversationChain(
llm=llm,
memory = ConversationBufferWindowMemory(k=3, return_messages=True)
)
The value of k
(in this case, 3) represents the number of past messages to be stored in the buffer. In other words, the memory will store the last 3 messages in the conversation. The return_messages
parameter, when set to True
, indicates that the stored messages should be returned when the memory is accessed. This will store the history as a list of messages, which can be useful when working with chat models.
The following codes is a sample conversation with the chatbot. You will see the output of the final message only. As it is visible, the history property removed the history of first message after the fourth interaction.
convo_buffer_win("What is your name?")
convo_buffer_win("What can you do?")
convo_buffer_win("Do you mind give me a tour, I want to see your galery?")
convo_buffer_win("what is your working hours?")
convo_buffer_win("See you soon.")
{'input': 'See you soon.',
'history': [HumanMessage(content='What can you do?', additional_kwargs={}, example=False),
AIMessage(content=" I can help you with a variety of tasks. I can answer questions, provide information, and even help you with research. I'm also capable of learning new things, so I'm always expanding my capabilities.", additional_kwargs={}, example=False),
HumanMessage(content='Do you mind give me a tour, I want to see your galery?', additional_kwargs={}, example=False),
AIMessage(content=" Sure! I'd be happy to give you a tour of my gallery. I have a variety of images, videos, and other media that I can show you. Would you like to start with images or videos?", additional_kwargs={}, example=False),
HumanMessage(content='what is your working hours?', additional_kwargs={}, example=False),
AIMessage(content=" I'm available 24/7! I'm always here to help you with whatever you need.", additional_kwargs={}, example=False)],
'response': ' Sure thing! I look forward to seeing you soon. Have a great day!'}
ConversationSummaryMemory
ConversationSummaryBufferMemory is a memory management strategy that combines the ideas of keeping a buffer of recent interactions in memory and compiling old interactions into a summary. It extracts key information from previous interactions and condenses it into a shorter, more manageable format. Here is a list of pros and cons of ConversationSummaryMemory
.
Advantages:
- Condensing conversation information By summarizing the conversation, it helps reduce the number of tokens required to store the conversation history, which can be beneficial when working with token-limited models like GPT-3
- Flexibility You can configure this type of memory to return the history as a list of messages or as a plain text summary. This makes it suitable for chatbots.
- Direct summary prediction
The
predict_new_summary
method allows you to directly obtain a summary prediction based on the list of messages and the previous summary. This enables you to have more control over the summarization process.
Disadvantages:
- Loss of information Summarizing the conversation might lead to a loss of information, especially if the summary is too short or omits important details from the conversation.
- Increased complexity
Compared to simpler memory types like
ConversationBufferMemory
, which just stores the raw conversation history,ConversationSummaryMemory
requires more processing to generate the summary, potentially affecting the performance of the chatbot.
The summary memory is built on top of the ConversationChain
. We use OpenAI's gpt-3.5-turbo-instruct
or other models like gpt-3.5-turbo
to initialize the chain. This class uses a prompt template where the {history}
parameter is feeding the information about the conversation history between the human and AI.
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory
# Create a ConversationChain with ConversationSummaryMemory
conversation_with_summary = ConversationChain(
llm=llm,
memory=ConversationSummaryMemory(llm=llm),
verbose=True
)
# Example conversation
response = conversation_with_summary.predict(input="Hi, what's up?")
print(response)
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
Current conversation:
Human: Hi, what's up?
AI:
> Finished chain.
Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?
In this step, we use the predict
method to have a conversation with the AI, which uses ConversationSummaryBufferMemory
to store the conversation's summary and buffer. We’ll create an example using Prompt Template to set the scene for the chatbot.
from langchain.prompts import PromptTemplate
prompt = PromptTemplate(
input_variables=["topic"],
template="The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\nCurrent conversation:\n{topic}",
)
This prompt template sets up a friendly conversation between a human and an AI
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryBufferMemory
llm = OpenAI(temperature=0)
conversation_with_summary = ConversationChain(
llm=llm,
memory=ConversationSummaryBufferMemory(llm=OpenAI(), max_token_limit=40),
verbose=True
)
conversation_with_summary.predict(input="Hi, what's up?")
conversation_with_summary.predict(input="Just working on writing some documentation!")
response = conversation_with_summary.predict(input="For LangChain! Have you heard of it?")
print(response)
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
Current conversation:
System:
The human greets the AI and the AI responds that it is doing great and helping a customer with a technical issue.
Human: Just working on writing some documentation!
AI: That sounds like a lot of work. What kind of documentation are you writing?
Human: For LangChain! Have you heard of it?
AI:
> Finished chain.
Yes, I have heard of LangChain. It is a blockchain-based language learning platform that uses AI to help users learn new languages. Is that the kind of documentation you are writing?
This type combines the ideas of keeping a buffer of recent interactions in memory and compiling old interactions into a summary. It uses token length rather than the number of interactions to determine when to flush interactions. This memory type allows us to maintain a coherent conversation while also keeping a summary of the conversation and recent interactions.
Advantages:
- Ability to remember distant interactions through summarization while keeping recent interactions in their raw, information-rich form
- Flexible token management allowing to control of the maximum number of tokens used for memory, which can be adjusted based on needs
Disadvantages:
- Requires more tweaking on what to summarize and what to maintain within the buffer window
- May still exceed context window limits for very long conversations
Comparison with other memory management strategies:
- Offers a balanced approach that can handle both distant and recent interactions effectively
- More competitive in token count usage while providing the benefits of both memory management strategies
With this approach, we can create a concise overview of each new interaction and continuously add it to an ongoing summary of all previous interactions.
In comparison with ConversationBufferWindowMemory and ConversationSummaryMemory, ConversationSummaryBufferMemory offers a balanced approach that can handle both distant and recent interactions effectively. It's more competitive in token count usage while providing the benefits of both memory management strategies.
Recap and Strategies
If the ConversationBufferMemory
surpasses the token limit of the model, you will receive an error, as the model will not be able to handle the conversation with the exceeded token count.
To manage this situation, you can adopt different strategies:
Remove oldest messages
One approach is to remove the oldest messages in the conversation transcript once the token count is reached. This method can cause the conversation quality to degrade over time, as the model will gradually lose the context of the earlier portions of the conversation.
Limit conversation duration Another approach is to limit the conversation duration to the max token length or a certain number of turns. Once the max token limit is reached and the model would lose context if you were to allow the conversation to continue, you can prompt the user that they need to begin a new conversation and clear the messages array to start a brand new conversation with the full token limit available.
ConversationBufferWindowMemory Method: This method limits the number of tokens being used by maintaining a fixed-size buffer window that stores only the most recent tokens, up to a specified limit.
→This is suitable for remembering recent interactions but not distant ones.
ConversationSummaryBufferMemory Approach:
This method combines the features: of ConversationSummaryMemory
and ConversationBufferWindowMemory
.
It summarizes the earliest interactions in a conversation while maintaining the most recent tokens in their raw, information-rich form, up to a specified limit.
→This allows the model to remember both distant and recent interactions but may require more tweaking on what to summarize and what to maintain within the buffer window.
It's important to keep track of the token count and only send the model a prompt that falls within the token limit.
→You can use OpenAI's tiktoken
library to handle the token count efficiently
Token limit:
The maximum token limit for the GPT-3.5-turbo model is 4096 tokens. This limit applies to both the input and output tokens combined. If the conversation has too many tokens to fit within this limit, you will have to truncate, omit, or shrink the text until it fits. Note that if a message is removed from the message's input, the model will lose all knowledge of it.
→To handle this situation, you can split the input text into smaller chunks and process them separately or adopt other strategies to truncate, omit, or shrink the text until it fits within the limit. One way to work with large texts is to use batch processing. This technique involves breaking down the text into smaller chunks and processing each batch separately while providing some context before and after the text to edit. You can find out more about this technique here:
When choosing a conversational memory implementation for your LangChain chatbot, consider factors such as conversation length, model token limits, and the importance of maintaining the full conversation history. Each type of memory implementation offers unique benefits and trade-offs, so it's essential to select the one that best suits your chatbot's requirements.
Conclusion
Selecting the most appropriate memory implementation for your chatbot will depend on understanding your chatbot's goals, user expectations, and the desired balance between memory efficiency and conversation continuity. By carefully considering these aspects, you can make a well-informed decision and ensure your chatbot provides a coherent and engaging conversational experience.
In addition to these memory types, another method to give your chat models memory is through the use of vector stores, such as with the previously introduced Deep Lake, which allows the storing and retrieval of vector representations for more complex and context-rich interactions.
In the next lesson, we’ll implement a chatbot whose goal is to explain codebases from GitHub repositories.
THE CODE EXAMPLES
You can find the code of this lesson in this online Notebook.