Quick Intro to Large Language Models

Introduction

In this lesson, we will explore how large language models learn token distributions and predict the next token, allowing them to generate human-like text that can both amaze and perplex us.

We'll start with a quick introduction to the inner workings of GPT-3 and GPT-4, focusing on their few-shot learning capabilities, emergent abilities, and the scaling laws that drive their success. We will then dive into some easy-to-understand examples of how these models excel in tasks such as text summarization and translation just by providing a few examples without the need for fine-tuning.

But it's not all smooth sailing in the world of LLMs. We will also discuss some of the potential pitfalls, including hallucinations and biases, which can lead to inaccurate or misleading outputs. It's essential to be aware of these limitations when using LLMs in use cases where 100% accuracy is paramount. On the flip side, their creative process can be invaluable in tasks where imagination takes center stage.

We will also touch upon the context size and maximum number of tokens that LLMs can handle, shedding light on the factors that define their performance.

LLMs in general:

LLMs are deep learning models with billions of parameters that excel at a wide range of natural language processing tasks. They can perform tasks like translation, sentiment analysis, and chatbot conversations without being specifically trained for them. LLMs can be used without fine-tuning by employing "prompting" techniques, where a question is presented as a text prompt with examples of similar problems and solutions.

Architecture:

LLMs typically consist of multiple layers of neural networks, feedforward layers, embedding layers, and attention layers. These layers work together to process input text and generate output predictions.

Future implications:

While LLMs have the potential to revolutionize various industries, it is important to be aware of their limitations and ethical implications. Businesses and workers should carefully consider the trade-offs and risks associated with using LLMs, and developers should continue refining these models to minimize biases and improve their usefulness in different applications. Throughout the course, we will address certain limitations and offer potential solutions to overcome them.

Maximum number of tokens

In the LangChain library, the LLM context size, or the maximum number of tokens the model can process, is determined by the specific implementation of the LLM. In the case of the OpenAI implementation in LangChain, the maximum number of tokens is defined by the underlying OpenAI model being used. To find the maximum number of tokens for the OpenAI model, refer to the max_tokens attribute provided on the OpenAI documentation or API.

For example, if you’re using the GPT-3 model, the maximum number of tokens supported by the model is 2,049. The max tokens for different models depend on the specific version and their variants. (e.g., davinci, curie, babbage, or ada) Each version has different limitations, with higher versions typically supporting larger number of tokens.

It is important to ensure that the input text does not exceed the maximum number of tokens supported by the model, as this may result in truncation or errors during processing. To handle this, you can split the input text into smaller chunks and process them separately, making sure that each chunk is within the allowed token limit. You can then combine the results as needed.

Here's an example of how you might handle text that exceeds the maximum token limit for a given LLM in LangChain. Mind that the following code is partly pseudocode. It's not supposed to run, but it should give you the idea of how to handle texts longer than the maximum token limit.

from langchain_openai import OpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Before executing the following code, make sure to have
# your OpenAI key saved in the “OPENAI_API_KEY” environment variable.
# Initialize the LLM
llm = OpenAI(model_name="gpt-4o-mini")

# Define the input text
input_text = "your_long_input_text"

# Determine the maximum number of tokens from documentation
max_tokens = 4097

# Split the input text into chunks based on the max tokens
text_chunks = split_text_into_chunks(input_text, max_tokens)

# Process each chunk separately
results = []
for chunk in text_chunks:
    result = llm.process(chunk)
    results.append(result)

# Combine the results as needed
final_result = combine_results(results)

In this example, split_text_into_chunks and combine_results are custom functions that you would need to implement based on your specific requirements, and we will cover them in later lessons. The key takeaway is to ensure that the input text does not exceed the maximum number of tokens supported by the model.

Note that splitting into multiple chunks can hurt the coherence of the text.

Tokens Distributions and Predicting the Next Token

Large language models like GPT-3 and GPT-4 are pretrained on vast amounts of text data and learn to predict the next token in a sequence based on the context provided by the previous tokens. GPT-family models use Causal Language modeling, which predicts the next token while only having access to the tokens before it. This process enables LLMs to generate contextually relevant text.

The following code uses LangChain’s OpenAI class to load GPT-3’s using gpt-4o-mini key to complete the sequence, which results in the answer. Before executing the following code, save your OpenAI key in the “OPENAI_API_KEY” environment variable. Moreover, remember to install the required packages with the following command:

pip install -qU langchain-text-splitters
pip install -qU langchain-openai
pip install -qU langchain-community

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

text = "What would be a good company name for a company that makes colorful socks?"

print(llm(text))

The sample code.

Rainbow Socks Co.

The output.

Tracking Token Usage

You can use the LangChain library's callback mechanism to track token usage. This is currently implemented only for the OpenAI API:

from langchain.chat_models import ChatOpenAI
from langchain_community.callbacks import get_openai_callback

llm = ChatOpenAI(model_name="gpt-4o-mini", n=2)

with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    print(cb)

The sample code.

Tokens Used: 46
	Prompt Tokens: 4
	Completion Tokens: 42
Successful Requests: 1
Total Cost (USD): $0.0009199999999999999

The output.

The callback will track the tokens used, successful requests, and total cost.

Few-shot learning

Few-shot learning is a remarkable ability that allows LLMs to learn and generalize from limited examples. Prompts serve as the input to these models and play a crucial role in achieving this feature. With LangChain, examples can be hard-coded, but dynamically selecting them often proves more powerful, enabling LLMs to adapt and tackle tasks with minimal training data swiftly.

This approach involves using the FewShotPromptTemplate class, which takes in a PromptTemplate and a list of a few shot examples. The class formats the prompt template with a few shot examples, which helps the language model generate a better response. We can streamline this process by utilizing LangChain's FewShotPromptTemplate to structure the approach:

from langchain_core.prompts.few_shot import FewShotPromptTemplate
from langchain_core.prompts.prompt import PromptTemplate

# create our examples
examples = [
    {
        "query": "What's the weather like?",
        "answer": "It's raining cats and dogs, better bring an umbrella!"
    }, {
        "query": "How old are you?",
        "answer": "Age is just a number, but I'm timeless."
    }
]

# create an example template
example_template = """
User: {query}
AI: {answer}
"""

# create a prompt example from above template
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template=example_template
)

# now break our previous prompt into a prefix and suffix
# the prefix is our instructions
prefix = """The following are excerpts from conversations with an AI
assistant. The assistant is known for its humor and wit, providing
entertaining and amusing responses to users' questions. Here are some
examples:
"""
# and the suffix our user input and output indicator
suffix = """
User: {query}
AI: """

# now create the few-shot prompt template
few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

After creating a template, we pass the example and user query, and we get the results

from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain

# load the model
chat = ChatOpenAI(model_name="gpt-4o-mini", temperature=0.0)

chain = few_shot_prompt_template | chat
chain.invoke("What's the meaning of life?")

The sample code

To live life to the fullest and enjoy the journey!

The output.

Emergent abilities, Scaling laws, and hallucinations

Another aspect of LLMs is their emergent abilities, which arise as a result of extensive pre-training on vast datasets. These capabilities are not explicitly programmed but emerge as the model discerns patterns within the data. LangChain models capitalize on these emergent abilities by working with various types of models, such as chat models and text embedding models. This allows LLMs to perform diverse tasks, from answering questions to generating text and offering recommendations.

Lastly, scaling laws describe the relationship between model size, training data, and performance. Generally, as the model size and training data volume increase, so does the model's performance. However, this improvement is subject to diminishing returns and may not follow a linear pattern. It is essential to weigh the trade-off between model size, training data, performance, and resources spent on training when selecting and fine-tuning LLMs for specific tasks.

While Large Language Models boast remarkable capabilities but are not without shortcomings, one notable limitation is the occurrence of hallucinations, in which these models produce text that appears plausible on the surface but is actually factually incorrect or unrelated to the given input. Additionally, LLMs may exhibit biases originating from their training data, resulting in outputs that can perpetuate stereotypes or generate undesired outcomes.

Examples with Easy Prompts: Text Summarization, Text Translation, and Question Answering

In the realm of natural language processing, Large Language Models have become a popular tool for tackling various text-based tasks. These models can be promoted in different ways to produce a range of results, depending on the desired outcome.

Setting Up the Environment

To begin, we will need to install the huggingface_hub library in addition to previously installed packages and dependencies. Also, keep in mind to create the Huggingface API Key by navigating to Access Tokens page under the account’s Settings. The key must be set as an environment variable with HUGGINGFACEHUB_API_TOKEN key.

!pip install -q huggingface_hub

Creating a Question-Answering Prompt Template

Let's create a simple question-answering prompt template using LangChain

from langchain_core.prompts.prompt import PromptTemplate

template = """Question: {question}

Answer: """
prompt = PromptTemplate(
    template=template,
    input_variables=['question']
)

# user question
question = "What is the capital city of France?"

Next, we will use the Hugging Face model google/flan-t5-large to answer the question. The HuggingfaceHub class will connect to Hugging Face’s inference API and load the specified model.

from langchain import HuggingFaceHub
from langchain.chains import LLMChain

# initialize Hub LLM
hub_llm = HuggingFaceHub(
        repo_id='google/flan-t5-large',
    model_kwargs={'temperature':0}
)

chain = prompt | hub_llm

# ask the user question about the capital of France
print(chain.invoke(question))

The sample code.

paris

The output.

We can also modify the prompt template to include multiple questions.

Asking Multiple Questions

To ask multiple questions, we can either iterate through all questions one at a time or place all questions into a single prompt for more advanced LLMs. Let's start with the first approach:

qa = [
    {'question': "What is the capital city of France?"},
    {'question': "What is the largest mammal on Earth?"},
    {'question': "Which gas is most abundant in Earth's atmosphere?"},
    {'question': "What color is a ripe banana?"}
]
res = llm_chain.generate(qa)
print( res )

The sample code.

LLMResult(generations=[[Generation(text='paris', generation_info=None)], [Generation(text='giraffe', generation_info=None)], [Generation(text='nitrogen', generation_info=None)], [Generation(text='yellow', generation_info=None)]], llm_output=None)

The output.

We can modify our prompt template to include multiple questions to implement a second approach. The language model will understand that we have multiple questions and answer them sequentially. This method performs best on more capable models.

multi_template = """Answer the following questions one at a time.

Questions:
{questions}

Answers:
"""
long_prompt = PromptTemplate(template=multi_template, input_variables=["questions"])

llm_chain = long_prompt | llm


qs_str = (
    "What is the capital city of France?\n" +
    "What is the largest mammal on Earth?\n" +
    "Which gas is most abundant in Earth's atmosphere?\n" +
		"What color is a ripe banana?\n"
)

llm_chain.invoke(qs_str).content

The sample code.

1. The capital city of France is Paris.
2. The largest mammal on Earth is the blue whale.
3. The gas that is most abundant in Earth's atmosphere is nitrogen.
4. A ripe banana is yellow.

The output. Note that your output may be different because of the inherent randomicity in the text generation process.

Text Summarization

Using LangChain, we can create a chain for text summarization. First, we need to set up the necessary imports and an instance of the OpenAI language model:

from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

Next, we define a prompt template for summarization:

summarization_template = "Summarize the following text to one sentence: {text}"
summarization_prompt = PromptTemplate(input_variables=["text"], template=summarization_template)
summarization_chain = summarization_prompt | llm

To use the summarization chain, simply call the predict method with the text to be summarized:

text = "LangChain provides many modules that can be used to build language model applications. Modules can be combined to create more complex applications, or be used individually for simple applications. The most basic building block of LangChain is calling an LLM on some input. Let’s walk through a simple example of how to do this. For this purpose, let’s pretend we are building a service that generates a company name based on what the company makes."
summarized_text = summarization_chain.invoke(text)
summarized_text.content

The sample code.

LangChain offers various modules for developing language model applications, which can be used alone for simple applications or combined for more complex ones.

The output.

Text Translation

It is one of the great attributes of Large Language models that enables them to perform multiple tasks just by changing the prompt. We use the same llm variable as defined before. However, pass a different prompt that asks for translating the query from a source_language to the target_language.

translation_template = "Translate the following text from {source_language} to {target_language}: {text}"
translation_prompt = PromptTemplate(input_variables=["source_language", "target_language", "text"], template=translation_template)
translation_chain = LLMChain(llm=llm, prompt=translation_prompt)

To use the translation chain, call the predict method with the source language, target language, and text to be translated:

source_language = "English"
target_language = "French"
text = "Your text here"
translated_text = translation_chain.predict(source_language=source_language, target_language=target_language, text=text)

The sample code.

Votre texte ici

The output.

You can further explore the LangChain library for more advanced use cases and create custom chains tailored to your requirements.

Conclusion

In conclusion, large language models (LLMs) such as GPT-3, ChatGPT, and GPT-4 have shown remarkable capabilities in generating human-like text, driven by their few-shot learning and emergent abilities. These models excel in other tasks like text summarization and translation, often without the need for fine-tuning. However, it is crucial to acknowledge the potential pitfalls, such as hallucinations and biases, that can result in misleading or inaccurate outputs.

While LLMs can be a powerful creative asset, it is essential to be aware of their limitations and use them cautiously in cases requiring absolute accuracy. Furthermore, understanding the context size and maximum token limitations is vital to optimizing LLM performance. As we continue to develop and utilize LLMs, balancing their potential benefits with the need to mitigate risks and address their inherent limitations is imperative.

In the next lesson you’ll find a first introduction at developing applications leveraging LangChain.

You can find the code of this lesson in this online Notebook.