Creating a RAG AI Tutor Bot with TowardsAI & Activeloop

Abstract This technical report describes the development and deployment of a Retrieval Augmented Generation (RAG) based AI Tutor to provide scalable, personalized support for students enrolled in the Gen AI 360 online courses. The solution leverages Activeloop’s database for AI, 4th Gen Intel® Xeon® processors, and Intel® oneAPI Math Kernel Library (oneMKL). The RAG AI Tutor project showcases the capabilities of Intel's 4th Gen Xeon® processors and oneAPI Math Kernel Library (oneMKL) in driving high-performance computing for AI applications. Intel's technology enables real-time AI interactions and efficient cosine similarity computations, crucial for the RAG system's embedding retrieval, demonstrating Intel® hardware and software's synergy in advanced AI tasks.

Introduction

The Gen AI 360, facilitated by Activeloop, Towards AI, & Intel Disruptor Initiative, faces challenges in scaling student support for its extensive online courses. A RAG-based AI Tutor has been developed to augment human tutor efforts by providing immediate, accurate responses to student inquiries.

Problem Statement

Scalability issues arise when providing support for thousands of students. There is a need for an AI-based solution that can deliver individualized support for routine queries while ensuring knowledge relevance accuracy and reducing hallucinations.

The RAG AI Tutor Solution

A partnership between Towards AI, Activeloop, and Intel has resulted in an RAG AI Tutor that supports tens of thousands of students.

Knowledge Base

Content: Access to a library of ~4,000 AI tutorials and articles, including updated information on LLMs and articles from the GenAI360 course.
Storage: 21,000 chunks of data (~500 words each) transformed into embedding vectors.

Relevancy, Accuracy, and Hallucinations

Technique: Utilizes RAG and Deep Memory by Activeloop.
Implementation: Strictly answer questions from the embedded knowledge base to reduce misinformation.

Cost, Latency, and Scalability

Hardware: Utilization of 4th Gen Intel® Xeon® processors.
Optimization: Balancing the number of sources and data chunks to enhance speed and reduce costs.

Technology Stack

LLMs: OpenAI’s ada-002 embeddings and GPT-Turbo 16k LLM.
Intel Inside: 4th Gen Intel® Xeon® processors for low-latency LLM inference with Intel® AVX-512, Intel® oneAPI Math Kernel Library (oneMKL).
Data Storage: Activeloop Deep Lake is a database for AI that hosts embeddings and metadata and enables deep memory for better retrieval.

Technical Achievements

Compute Efficiency: A 22.93% increase in cosine similarity computations was achieved with 4th Gen Intel® Xeon® compared to 3rd Gen Xeon® Processors.
Accuracy Improvement: The implementation of Deep Memory noted a 20% increase in recall@10 for embedding retrieval.

Implementation in Educational Context

The RAG AI Tutor is integrated with Gen AI 360 course offerings, providing in-depth lessons and tutorials, practical coding projects, real-time AI assistance for technical Q&A, and access to updated AI-related content and community support.

Product Outcome

Response Time: An AI Tutor delivers answers with a 0.0243-second response time.
Accuracy: The utilization of pre-filtered knowledge sources and the Deep Memory by Activeloop dramatically reduces the risk of hallucinations.
Efficiency: Demonstrated speed improvements and accurate query handling.

Future Steps

Deployment: Introduction of a Discord bot for real-time student interaction.
Optimization: Continual refinement of the data sources for the RAG model.
Experimentation: Potential fine-tuning of open-source LLMs and new embedding models.

Key Learnings

Integrated AI solutions are vital for scalable student support in online education.
The selection of hardware and database management systems is crucial for the performance and cost-effectiveness of AI applications.

Conclusion

The collaborative development of the RAG AI Tutor represents an advancement in AI-assisted education, focusing on scalability, accuracy, and speed. Implementing efficient hardware and vector databases is critical to the success of AI applications in education.

Disclaimers

Performance varies by use, configuration, and other factors. Learn more on the/Performance Index site. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details.

No product or component can be absolutely secure. Your costs and results may vary. For workloads and configurations, visit 4th Gen Xeon® Scalable processors at www.intel.com/processorclaims. Results may vary. Intel technologies may require enabled hardware, software or service activation. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. Intel® technologies may require enabled hardware, software, or service activation.