Embarking on the AI Adventure Part 4: Building a Simplified RAG System

Ahmed Ibrahim
JavaScript in Plain English
7 min readFeb 27, 2024

--

In the previous article, we described how can we create a simple chatbot describing the differences between some Langchain Memory types.

In this article, we will introduce what RAG is and build a simplified RAG with the help of Langchain and Chroma DB.

What is RAG?

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model.

In our case, we will create a simple RAG as follows:

For simplicity, we won’t create a file upload functionality. we will add the data file to the code and use its path.

What is Embedding?

Embedding is creating a vector that is created by a deep learning model for similarity searches.

After creating the embeddings we will store them in a vector database to query using similarity search.

You can follow our code: https://github.com/Ahmedhemaz/langChain-Articles

Loading Data to ChromaDB:

1- create a new branch

git checkout -b rag

2- we will add chrome db in our docker-compose file

version: "3"
services:
api:
build:
dockerfile: Dockerfile.dev
context: ./
volumes:
- ./:/usr/app
- /usr/app/node_modules
env_file:
- ./.env
ports:
- "3000:3000"
entrypoint: ./entry-point.sh
extra_hosts:
- "host.docker.internal:host-gateway"
db:
image: chromadb/chroma
ports:
- 8000:8000
volumes:
- chroma-data:/chroma/chroma
volumes:
chroma-data:
driver: local

3- we will add a facts file facts.txt on the root level of the project

4- create data loading functionality that will load the data file, embed it, and store that in our vector database.

import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { CharacterTextSplitter } from "langchain/text_splitter";

export const loadFileToVectorDb = async () => {
const loader = new TextLoader("./facts.txt");

const docs = await loader.load();

const splitter = new CharacterTextSplitter({
separator: "\n",
chunkSize: 200,
chunkOverlap: 0,
});

const chunks = await splitter.createDocuments([docs[0].pageContent]);

const embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_KEY,
});

const vectorStore = await Chroma.fromDocuments(
chunks,
new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_KEY,
}),
{
collectionName: "facts-collection",
url: `http://${process.env.DATABASE_HOST}:${process.env.DATABASE_PORT}`, // Use "http://" prefix for the URL
collectionMetadata: {
"hnsw:space": "cosine",
},
}
);

};

Code explanation:

const loader = new TextLoader("./facts.txt");
const docs = await loader.load();

In these steps, we are loading the facts.txt into memory

const splitter = new CharacterTextSplitter({
separator: "\n",
chunkSize: 200,
chunkOverlap: 0,
});

we are creating text chunks the chunk size is 200 characters or ends with “\n” and chunkOverlap to create an overlapping section between chunks.

example for chunks overlapping:

first chunk:

The speed of light is generally rounded down to 186,000 miles per second. In exact terms it is 299,792,458 m/s.
It takes 8 minutes 17 seconds for light to travel from the Sun’s surface to the Earth.
October 12th, 1999 was declared “The Day of Six Billion” based on United Nations projections.

second chunk:

October 12th, 1999 was declared “The Day of Six Billion” based on United Nations projections.
10 percent of all human beings ever born are alive at this very moment.
The Earth spins at 1,000 mph but it travels through space at an incredible 67,000 mph.

we can use that to get more context and link chunks with each other.

const embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_KEY,
});

const vectorStore = await Chroma.fromDocuments(
chunks,
new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_KEY,
}),
{
collectionName: "facts-collection",
url: `http://${process.env.DATABASE_HOST}:${process.env.DATABASE_PORT}`, // Use "http://" prefix for the URL
collectionMetadata: {
"hnsw:space": "cosine", // Optional, can be used to specify the distance method of the embedding space
},
}
);

store these chunks in our vector database and embed them using OpenAI embedding.

for the space calculation refer to chroma db from this: link

5- create an endpoint to execute loading data to our vector store with embeddings

import express from "express";
import dotenv from "dotenv";
import { loadFileToVectorDb } from "./file-data-loader.js";

dotenv.config();

const app: express.Express = express();
const port: string | number = process.env.PORT || 3000;
app.use(express.json());

app.listen(port, () => {
console.log(`Server is listening on port ${port}`);
});

app.post("/load-file-to-vdb", async (req, res, next) => {
await loadFileToVectorDb();

res.status(200).send({
message: "loaded",
});
});

6- let's try to execute that endpoint

It seems like it worked. To make sure let's try to query chroma db and get some records to see how documents and embeddings are stored.

Query Stored Documents and Embeddings:

1- create a query file get-collection-data.ts

import { ChromaClient, IncludeEnum } from "chromadb";
export const getCollectionData = async (collectionName: string) => {
const client = new ChromaClient({
path: `http://${process.env.DATABASE_HOST}:${process.env.DATABASE_PORT}`,
});

const collection = await client.getCollection({
name: collectionName,
});

const data = await collection.get({
include: [IncludeEnum.Documents, IncludeEnum.Embeddings, IncludeEnum.Metadatas],
});

return data;
};

2- add a new endpoint in server.ts

app.get("/get-file-data", async (req, res, next) => {
const loadedData = await getCollectionData("c-test-collection");
res.status(200).send({
loadedData,
});
});

3- execute the query to see the results:

embeddings:

Doing a similarity search from the user’s input:

1- create a new file get-related-data.ts

import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";

export interface GetRelatedDataInput {
input: string;
}

export const getRelatedData = async (params: GetRelatedDataInput) => {
const embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_KEY,
});

const vectorStore = await Chroma.fromExistingCollection(embeddings, {
collectionName: "c-test-collection",
url: `http://${process.env.DATABASE_HOST}:${process.env.DATABASE_PORT}`,
collectionMetadata: {
"hnsw:space": "cosine",
},
});

const data = await vectorStore.similaritySearch(params.input, 1);

return data;
};

2- create a new endpoint to execute the similarity search with user input

app.post("/get-related-data", async (req, res, next) => {
const data = await getRelatedData(req.body);
res.status(200).send({
data,
});
});

3- execute the query

It seems like we did not chunk the document properly

Adding All together:

we will add all together with the knowledge of creating a chatbot we will make that bot only answer from the data we will provide to it.

1- create a file get-related-data-as-retrieval.ts

import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { Chroma } from "@langchain/community/vectorstores/chroma";
import {
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
ChatPromptTemplate,
} from "langchain/prompts";
import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables";
import { formatDocumentsAsString } from "langchain/util/document";
import { StringOutputParser } from "@langchain/core/output_parsers";

export interface GetRelatedDataAsRetrieval {
input: string;
}

const embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_KEY,
});

const chatModel = new ChatOpenAI({
openAIApiKey: process.env.OPENAI_KEY,
});

const vectorStore = await Chroma.fromExistingCollection(embeddings, {
collectionName: "c-test-collection",
url: `http://${process.env.DATABASE_HOST}:${process.env.DATABASE_PORT}`,
collectionMetadata: {
"hnsw:space": "cosine",
},
});

const retrieval = vectorStore.asRetriever();

export const getRelatedDataAsRetrieval = async (params: GetRelatedDataAsRetrieval) => {
const SYSTEM_TEMPLATE = `Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{context}`;

const messages = [
SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
HumanMessagePromptTemplate.fromTemplate("{input}"),
];

const prompt = ChatPromptTemplate.fromMessages(messages);
const chain = RunnableSequence.from([
{
context: retrieval.pipe(formatDocumentsAsString),
input: new RunnablePassthrough(),
},
prompt,
chatModel,
new StringOutputParser(),
]);

const answer = await chain.invoke(params.input);

return answer;
};

2- add an endpoint in server.ts

app.post("/get-related-data-as-retrieval", async (req, res, next) => {
const data = await getRelatedDataAsRetrieval(req.body);
res.status(200).send({
data,
});
});

3- execute the endpoint

4- lets ask it about something we did not have in our data store

It didn’t answer it

Conclusion:

we explored building a Retrieval-Augmented Generation (RAG) system, enhancing language models with external knowledge. We learned about embeddings, storing data in Chroma DB, and integrating it with a chatbot. Despite challenges, we laid the foundation for leveraging external knowledge to enrich language model responses.

References:

1- https://aws.amazon.com/what-is/retrieval-augmented-generation/

2- https://www.promptingguide.ai/applications/synthetic_rag

3- https://js.langchain.com/docs/get_started/introduction

4- https://docs.trychroma.com/

5- https://www.cloudflare.com/learning/ai/what-are-embeddings/

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go:

--

--