LLM RAG Demonstration
This is a demonstration of a Retrieval-Augmented Generation (RAG) system using a Large Language Model (LLM).
The system is designed to retrieve relevant information from a knowledge base and generate responses based on that information.
The goal is to provide accurate and contextually relevant answers to user queries by leveraging the power of LLMs and a retrieval system.
Ideally this would utilize a vector database to store the knowledge base and allow for efficient retrieval of relevant information.
In this demo, the documents set up to use a list of small strings as the knowledge base, and the retrieval process is done using a cosine similarity equation
Your prompt is modified to instruct the LLM to be a chatbot to assist users with finding the appropriate team to escalate to.
How Does it Work?
The system takes the user prompt and uses the Natural Language Tool Kit (NLTK) to remove the common words, then we split the prompt into a list of words, split by spaces.
It then does the same with the documents. Normally this would be done when ingesting into a vector database, but that is still to come.
It then combines each document with the prompt in a combined set, and compares how many words are in each original list, creating a vector.
The final step is putting them through the cosine equation. Once there is a dictionary of document indexes and cosine scores, we can fetch the most relevant ones to pass to the LLM with your prompt.
Documents
Normally the documents would be ingested into a vector database and the similarities compared against it. For the time being, we are using a simple list of strings, converted to vectors for comparisson.
For demonstration purposes the documents are very small, each one containing the name of a team and very brief description of their responsibilities:
TMV team handles video playback
VOPS team handles video on demand issues
SMT team handles app navigation issues
IPTMS team handles authentication and authorization issues like login, usernames and passwords,
LLM Query
You may chose to use either OpenRouter or OpenAI to test this RAG
To use this demonstration it will require a message, model and API Token.
The default model for OpenRouter is 'mistralai/mistral-small-3.1-24b-instruct:free' and 'gpt-3.5-turbo-0125' for OpenAI if the field is left blank. I recommend using OpenRouter for this demo as it offers free models over API.
Note that your token is not saved anywhere and is simply passed to the connection, and my Gitlab project in the footer will show this.
Though for extra caution I do recommend generating a new token for this demo, and deleting it once you are finished.
Please ask the bot in terms of teams to escalate to, using wording similar to the documents above.