Introduction
In this lesson, we will explore the Assistant APIs from OpenAI. We will learn about the primary features of the Assistants API, including the Code Interpreter, Knowledge Retrieval, and Function Calling capabilities.
We share a hands-on example to demonstrate the integration of the Code Interpreter with an existing Assistant. The example will show how to enhance an Assistant's ability to provide technical solutions by executing Python code, thus reducing LLM “hallucinations.”
We will also introduce other advanced technologies from OpenAI, such as Whisper, Dalle-3, Speech to Text, and the GPT-4 vision API. These tools are useful for anyone looking to develop sophisticated AI assistants using a variety of APIs.
Then, we will learn how to use the free Hugging Face Inference API to get access to the thousands of models hosted on their platform.
By the end of this lesson, you will have gained a solid understanding of how to apply these technologies in your AI projects effectively. Before starting this guide, please make sure to install all the requirements in the requirements section.
Open AI Assistant’s Built-in Functionalities
The OpenAI Assistants API includes three main functionalities: Code Interpreter, Retrieval, and Function Calling.
Code Interpreter
: This functionality allows the Assistant to generate and run Python code in a sandboxed execution environment. The Assistant can use Code Interpreter automatically from your conversation or when you upload a file with data.
It's a tool that transforms the LLM into a more accurate computational problem-solver that can handle tasks like solving complex math equations. It can also generate files with data and images of graphs from the same Python code. It's a useful way to trust the output from the assistant and a great tool when analyzing data.
Knowledge Retrieval
: This is OpenAI’s own retrieval augmented generation (RAG) system offered as part of the Assistants API. It allows multiple uploads. Once the files are uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index them, store the embeddings, and implement vector search to retrieve relevant content to answer user queries.
Function Calling
: Function calling allows you to describe functions or tools to the Assistant and have it return the functions that need to be called along with their arguments. It's a powerful way to add new capabilities to your Assistant.
How To Set Up an Assistant
You have two distinct pathways depending on your needs and expertise:
- Assistants Playground: Ideal for those looking to get a feel for the Assistant's capabilities without going into complex integrations.
- Detailed Integration through the API: Best suited for those who require a more customized and in-depth setup.
STEP-BY-STEP ASSISTANT CREATION:
- Creating an
Assistant
: - Setting up a
Thread
: - Adding a
Message
: - Executing with
Run
: - Displaying the
Response
:
Purpose: An Assistant object represents an entity/agent that can be configured to respond to users’ messages in different ways using several parameters.
Model Selection: you can specify any version of GPT-4 models, including fine-tuned models. OpenAI recommends using its latest models with the Assistants API for best results and maximum compatibility with tools. Thus, choose between gpt-4o-mini
or gpt-4-1106-preview
models.
Tools: The Assistant supports the Code Interpreter for technical queries that require Python code execution or Knowledge Retrieval to augment the Assistant with proprietary external information.
pip install openai
Replace the text with your OpenAI API key, which you can get from your OpenAI developer account.
import os, getpass
os.environ["ACTIVELOOP_TOKEN"] = getpass.getpass("Enter your ActiveLoop API key: ")
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
Role: A Thread acts as the foundational unit of user interaction. It can be seen as a single conversation. Pass any user-specific context and files in this thread by creating Messages.
from openai import OpenAI
client = OpenAI()
thread = client.beta.threads.create()
Customization: In Thread, ingest user-specific contexts or attach necessary files so each conversation is unique and personalized.
Threads don’t have a size limit. You can add as many messages as you want to a conversation/Thread. The Assistant will ensure that requests to the model fit within the maximum context window, using relevant optimization techniques used in ChatGPT, such as truncation.
Definition: Messages are user inputs, and the Assistant’s answers are appended to a Thread. User inputs can be questions or commands.
Function: They serve as the primary mode of communication between the user and the Assistant.
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="I need to solve the equation `3x + 11 = 14`. Can you help me?"
)
Messages can include text, images, and other files. Messages are stored as a list on the Thread. Using GPT-4 with Vision is not supported here. You can upload images and have them processed via retrieval.
Activation: For the Assistant to respond to the user message, you must create a Run. The Assistant will then automatically decide what previous Messages to include in the context window for the model.
Process: The Assistant processes the entire Thread, employs its tools if required, and formulates an appropriate response.
During its run, the Assistant can call tools or create Messages. Examining Run Steps allows you to check how the Assistant is getting to its final results.
Outcome: The assistant’s response to a Run:
messages = client.beta.threads.messages.list(
thread_id=thread.id
)
These responses are displayed to the user! During this Run, the Assistant added two new Messages to the Thread.
ASSISTANT’S CORE MECHANISM:
Creating an Assistant only requires specifying the model
. But you can further customize the behavior of the Assistant:
- Use the
instructions
parameter to guide the personality of the Assistant and define its goals. Instructions are similar to system messages in the Chat Completions API. - Use the
tools
parameter to give the Assistant access to up to 128 tools in parallel. You can give it access to OpenAI-hosted tools (Conde Interpreter, Knowledge Retrieval) or call third-party tools viafunction calling
. - Use the
file_ids
parameter to give the tools access to files. Files are uploaded using theFile
Upload endpoint.
Example demonstration:
Imagine you're developing an AI assistant for a tech company. This assistant needs to provide detailed product support using a comprehensive knowledge base.
Upload Files to a Knowledge Base:
First, make a folder to store all the files you’ll create. Upload a detailed PDF manual of a product line (e.g., "tech_manual.pdf") using the API:
file = client.files.create(
file=open("tech_manual.pdf", "rb"),
purpose="assistants"
)
Now you can create the assistant with an uploaded file and with the ability to retrieve: tools=[{"type": "retrieval"}]
assistant = client.beta.assistants.create(
instructions="You are a tech support chatbot. Use the product manual to respond accurately to customer inquiries.",
model="gpt-4-1106-preview",
tools=[{"type": "file_search"}],
tool_resources={
"code_interpreter": {
"file_ids": [file.id]
}}
)
User Interaction:
To interact with the assistant, you need a thread
and a message
. The message should contain the customer's question. Here's an example:
thread = client.beta.threads.create()
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="How do I reset my Model X device?",
)
RUN Thread:
- A customer asks, "How do I reset my Model X device?"
The assistant accesses the uploaded manual, performs a vector search to find the relevant section, and provides clear, step-by-step reset instructions.
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
# the run will enter the **queued** state before it continues it’s execution.
Information retrieval:
After the run is complete, you can retrieve the assistant's response:
messages = client.beta.threads.messages.list(
thread_id=thread.id
)
assistant_response = messages.data[0].content[0].text.value
assistant_response
The output result should contain the assistant's response to the customer's question based on knowledge from the uploaded manual.
You can see the full code and more examples in this Colab notebook.
OpenAI’s Other Advanced Models
OpenAI also offers different types of models that are not yet integrated into the Assistants API but are accessible. These models offer voice processing, image understanding, and image generation capabilities.
Whisper-v3
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It is a transformer-based encoder-decoder model, which is a type of sequence-to-sequence model. The latest large-v3
model shows improved performance over various languages compared to Whisper large-v2
. OpenAI released the model’s weights with an Apache License 2.0. The model is available on Hugging Face.
Text to Speech
TTS is an AI model that converts text to natural-sounding spoken text. They offer two different model variates: tts-1
is optimized for real-time text-to-speech use cases, and tts-1-hd
is optimized for quality. These models can be used with the Speech endpoint in the Audio API.
Dall-E 3
A newer iteration of the DALL-E model is designed for image generation. It can create images based on user prompts, making it a valuable tool for graphic designers, artists, and anyone to generate images quickly and efficiently. You can access the model through the image generation endpoint.
GPT-4 Vision
GPT-4 with Vision enables you to ask questions about the contents of images. Visual question answering (VQA) is an important computer vision research field. You can also perform other vision tasks, such as Optical Character Recognition (OCR), where a model reads text in an image.
Using GPT-4 with Vision, you can ask questions about what is or is not in an image, how objects relate in an image, the spatial relationships between two objects (is one object to the left or right of another), the color of an object, and more.
GPT-4V is available through the OpenAI web interface for ChatGPT Plus subscribers and through their API. This expands the model's utility beyond the traditional text-only inputs, enabling it to be applied in a wider range of contexts. It handles images through the Chat Completions API, but note that the Assistants API does not support GPT-4V at this time.
GPT4-V supports advanced use cases like creating image captions, in-depth analysis of visual content, and interpreting text and graphics in documents.
Hugging Face Inference API
Hugging Face (HF) offers a free service for testing and evaluating over 150,000 publicly available machine learning models hosted on their platform through their Inference API. They provide a wide range of models, including transformer and diffusion-based models, that can help solve various NLP or vision tasks such as text classification, sentiment analysis, named entity recognition, etc.
Steps to use the Inference API:
- Login to Hugging Face.
- Navigate to your profile on the top right navigation bar, then click "Edit profile.”
- Click on the "Access Tokens" menu item.
- Set the HF HUB API token:
os.environ['HUGGINGFACEHUB_API_TOKEN'] = getpass.getpass("Enter your Hugging Face API token: ")
- Use the
HUGGINGFACEHUB_API_TOKEN
as an environment variable
import os
from huggingface_hub import HfApi
API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")
- Run the Inference API
Inference is the process of using a trained model to predict new data. The huggingface_hub
library provides an easy way to call a service that runs inference for hosted models. As described above, you have two types of services available.
- Inference API: run accelerated inference on Hugging Face’s infrastructure for free.
- Inference Endpoints: easily deploy models to production (paid)
6.1 Choose a model from the Model Hub
The model checkpoints are stored in the Model Hub; you can search and share them. Note that not all models are available on the Inference API.
Once the endpoint has been created, you should see a URL endpoint of it like the following:
ENDPOINT = https://api-inference.huggingface.co/models/<MODEL_ID>
- Run the inference.
import requests
API_URL = "https://api-inference.huggingface.co/models/<MODEL_ID>"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
data = query("Can you please let us know more")
Hugging Face Tasks
The team at Hugging Face has categorized several models into the different tasks they can solve. You can find models for popular NLP tasks: Question Answering, Sentence Similarity, Summarization, Table Question Answering, and more.
Here is another example of using the Inference API for a summarization task.
import requests
import os
from huggingface_hub import HfApi
API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")
model_name = 'facebook/bart-large-cnn'
text_to_summarize = "Hugging Face's API simplifies accessing powerful NLP models for tasks like summarization, transforming verbose texts into concise, insightful summaries."
endpoint = f'https://api-inference.huggingface.co/models/{model_name}'
headers = {'Authorization': f'Bearer {API_TOKEN}'}
data = {'inputs': text_to_summarize}
response = requests.post(endpoint, headers=headers, json=data)
summarized_text = response.json()[0]['summary_text']
print(summarized_text)
We used a pre-trained model, facebook/bart-large-cnn
, showcasing its ability to produce clear and concise summaries.
Note: Not all models are available in this Inference API. Verify if the model is available by reviewing its 'Model card.’
Sentiment analysis task:
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english"
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
data = query({"inputs": "I love how this app simplifies complex tasks effortlessly . I'm frustrated by the frequent errors in the software's latest update"})
print(data)
Text-to-image task:
# run a few installations
!pip install diffusers["torch"] transformers
!pip install -U sentence-transformers
from diffusers import StableDiffusionPipeline
import torch
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "Create an image of a futuristic cityscape on an alien planet, featuring towering skyscrapers with glowing neon lights, a sky filled with multiple moons, and inhabitants of various alien species walking through vibrant market streets"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
Resulting image:
You can also encode a sentence and get text embeddings.
from sentence_transformers import SentenceTransformer
sentences = ["GAIA's questions are rooted in practical use cases, requiring AI systems to interact with a diverse and uncertain world, reflecting real-world applications.", " GAIA questions require accurate execution of complex sequences of actions, akin to the Proof of Work concept, where the solution is simple to verify but challenging to generate."]
model = SentenceTransformer('FacebookAI/xlm-roberta-large-finetuned-conll03-english', use_auth_token=API_TOKEN)
embeddings = model.encode(sentences)
print(embeddings)
[[ 0.76227915 -0.5500489 -1.5719271 ... -0.34034422 -0.27251056 0.12204967] [ 0.29783687 0.6476462 -2.0379746 ... -0.28033397 -1.3997376 0.25214267]]
You can also experiment with image-captioning models:
from transformers import pipeline
image_to_text = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")
image_to_text("https://ankur3107.github.io/assets/images/image-captioning-example.png")
# [{'generated_text': 'a soccer game with a player jumping to catch the ball '}]
You can experiment with classification tasks with image-to-text models pre-trained on ImageNet:
from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import requests
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
preprocessor_config.json: 100%
160/160 [00:00<00:00, 10.5kB/s]
config.json: 100%
69.7k/69.7k [00:00<00:00, 3.60MB/s]
model.safetensors: 100%
346M/346M [00:02<00:00, 162MB/s]
Predicted class: Egyptian cat
Here, we scrape a web page to get the articles and summarize them with a huggingface model using the inference API.
pip install requests beautifulsoup4
import requests
from bs4 import BeautifulSoup
# Function to fetch text from the API
def fetch_text_from_api(url):
try:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
text = soup.get_text()
# Remove any excessive whitespace
clean_text = ' '.join(text.split())
return clean_text
except requests.exceptions.RequestException as e:
print(f"Error during request: {e}")
return None
def query_huggingface(payload):
API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
url = 'https://techcrunch.com/2023/11/25/neuralink-elon-musks-brain-implant-startup-quietly-raises-an-additional-43m/' # Replace with your desired URL
text_to_summarize = fetch_text_from_api(url)[:200]
# Summarize the text
summarization_payload = {
"inputs": text_to_summarize,
"parameters": {"do_sample": False},
}
summary_response = query_huggingface(summarization_payload)
print(summary_response)
[{'summary_text': 'Elon Musk-founded company raises $43 million in new venture capital. The company is developing implantable chips that can read brain waves. Critics say the company has a toxic workplace culture and unethical research practices. In June, Reuters reported that the company was valued at about $5 billion.'}]
Conclusion
In this lesson, we learned to use the OpenAI Assistants API, which enables tools like Code Interpreter and Knowledge Retrieval for enhanced functionality. Essential components like Threads
and Messages
facilitate user interaction, with the Assistant processing inputs and generating responses. We also demonstrated how an AI assistant can be deployed in a tech support example, utilizing these tools and methodologies for effective customer interaction.
We also explored Hugging Face's free Inference API, which offers many models that can solve different tasks. Through practical examples, we demonstrated how to authenticate, access models via the Model Hub, and perform various NLP tasks, highlighting the API's versatility and ease of use in handling complex AI challenges.
Through Function Calling, the OpenAI models can access the Hugging Face models via the free Inference API.
RESOURCES
- OpenAI API Docs
- Assistants API Colab notebook
- OpenAI Knowledge Retrieval
- Function Calling OpenAI