You're reading for free via D's Friend Link. Become a member to access the best of Medium.

Member-only story

A quick start to RAG with a local setup

Published in

FAUN — Developer Community 🐾

4 min readOct 19, 2024

To read this story for free check out this link.

Imagine heading over to https://chatgpt.com/ and asking ChatGPT a bunch of questions. A pretty good way to pass a hot and humid sunday afternoon if you ask me. What if you had a bunch of documents you wanted to decipher? Perhaps they are your lecture notes from CS2040. Now ask the LLM a question: “What did the professor highlight about linked lists in Lecture 4?”

The model spits out a bunch of random information. Let’s say someone magically types in some information (read: context) to the model to help with this.

You have to admit it’s naive to always provide the model with context. Is someone always going to have to type this out? Well you are in luck! Retrieval-Augmented Generation does just that! The idea is that a query is vectorised and used to search against a pre-vectorised set of information in a database to retrieve the top few matches based on a similarity algorithm. These matches are returned to the LLM as context for it to answer questions. What are we waiting for? Let’s get started!

Start Ollama

First let’s get the Ollama Server up and running. Ollama is a tool that allows us to load models locally for testing.

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

You will know this is successful when you head over to localhost:11434and see the words “Ollama is running”.

Start a python environment

Any python environment is fine but I personally enjoy using the langchain image.

docker pull langchain/langchain
docker run -it --network host langchain/langchain sh

Install dependencies

apt update
apt upgrade -y
apt install vim tmux -y

Create a working directory

mkdir test && cd test
sudo apt install python3.9-venv
python3.9 -m venv .venv
source .venv/bin/activate

Install python packages

pip install ollama PyPDF2 numpy faiss-cpu copy

Transfer the pdf from your computer to the docker container

Download a pdf. It can be any pdf from anywhere.

# This command should be run on your local computer NOT in the docker container
# Take note that the id should be that of langchain/langchain which can be found using docker container ls
docker cp 2210.03629v3.pdf <container_id>:test

Add the boiler plate code

import ollama
import PyPDF2
import numpy as np
import faiss
import copy

ollama.pull("llama3.2:1b")
ollama.pull("all-minilm")

# Load the PDF
def load_pdf(file_path):
    with open(file_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page in reader.pages:
            text += page.extract_text() + "\n"
    return text

def get_embeddings(text):
    # Use the embed method from Ollama
    embeddings = ollama.embeddings(
      model="all-minilm",
      prompt=f"{text}"
    )
    return np.array(embeddings["embedding"])  # Ensure this returns a NumPy array

# Initialize FAISS index
def create_faiss_index(embeddings):
    index = faiss.IndexFlatL2(embeddings.shape[1])  # Dimensionality of embeddings
    index.add(embeddings)
    return index

if __name__ == "__main__":
    pdf_text = load_pdf('2210.03629v3.pdf')

    # Split the text into manageable chunks if necessary
    text_chunks = pdf_text.split('\n')  # Splitting by paragraphs or other logic
    print(f"chunks: {text_chunks}")
    text_chunks = text_chunks[:-1]
    text_chunks_copy = copy.deepcopy(text_chunks)  # Create a deep copy of text_chunks
    print(f"chunks: {text_chunks_copy}")

    # Get embeddings for all text chunks
    embeddings = np.vstack([get_embeddings(chunk) for chunk in text_chunks])  # Stack into a 2D array

    # Create FAISS index
    faiss_index = create_faiss_index(embeddings)

    # Perform a query (for demonstration)
    query = "Your query here"
    query_embedding = get_embeddings(query)
    D, I = faiss_index.search(query_embedding.reshape(1, -1), k=5)  # Top 5 results

    print(f"The value of I is: {I}")
    # Retrieve the actual text corresponding to the indices
    context_texts = []
    for idx in I[0]:
      print(f"the chunk is: {text_chunks_copy[idx]} and the index is: {idx}")
      context_texts.append(text_chunks_copy[idx])
    context = " ".join(context_texts)  # Join the texts into a single string

    print(f"The context is: {context_texts}")
    question = "What is the gist of this paper?"
    response = ollama.chat(model='llama3.2:1b', messages=[
      {
        'role': 'user',
        'content': f'answer this question: {question} based on the context: {context}. Do not deviate.',
      },
    ])
    print(response['message']['content'])
    ollama.delete("llama3.2:1b")
    ollama.delete("all-minilm")

Code Explained

There are a few broad steps.

Read the pdf pdf_text = load_pdf('2210.03629v3.pdf')
Chunk the text text_chunks = pdf_text.split('\n')
Embed the chunks embeddings = np.vstack([get_embeddings(chunk) for chunk in text_chunks])
Search for the query
Pass the context to the LLM

This is part of a larger series. Check out previous articles if this is your cup of tea.

Get started with Langgraph

Check out this link to read this article for free.. "Get started with Langgraph" is published by D.

parkerrobert.medium.com

Get started with Agentic workflows

Use this link to read this article for free

parkerrobert.medium.com

Get started with Langchain

Langchain is a toolkit that allows one to get started with Agentic flows in an efficient manner. In a previous article…

parkerrobert.medium.com

Run your own LLM locally with Ollama

For a free version of this article this link should suffice.

parkerrobert.medium.com

A quick start to RAG with a local setup

Start Ollama

Start a python environment

Install dependencies

Create a working directory

Install python packages

Transfer the pdf from your computer to the docker container

Add the boiler plate code

Code Explained

Get started with Langgraph

Check out this link to read this article for free.. "Get started with Langgraph" is published by D.

Get started with Agentic workflows

Use this link to read this article for free

Get started with Langchain

Langchain is a toolkit that allows one to get started with Agentic flows in an efficient manner. In a previous article…

Run your own LLM locally with Ollama

For a free version of this article this link should suffice.

👋 If you find this helpful, please click the clap 👏 button below a few times to show your support for the author 👇

🚀Join FAUN Developer Community & Get Similar Stories in your Inbox Each Week

Published in FAUN — Developer Community 🐾

Written by D

No responses yet

More from D and FAUN — Developer Community 🐾

How to remove hardsell from Glassdoor?

Have you ever faced this situation when scouting out a potential employer or while simply checking out the sentiment at your current place?

Top 10 Enterprise Technology Trends in 2025: Platform Engineering and AI Agents Lead the Charge…

Driving Consolidation, Automation, and Intelligent Collaboration for Greater Business Impact

Stop Memory Leaks in Node.js: Easy Tricks to Boost App Performance

Master the art of detecting and fixing memory leaks in Node.js with these proven tips and tools.

How to run a local Elasticsearch using docker (in under 5 min)?

Read this for free here.

Recommended from Medium

Building a RAG-Enhanced Conversational Chatbot Locally with Llama 3.2 and Ollama

Optimize Open WebUI: Three practical extensions for a better user experience

Instructions for setting up status emitters, word filters and memory functions to customize the interface and functions.

Lists

Generative AI Recommended Reading

What is ChatGPT?

The New Chatbots: ChatGPT, Bard, and Beyond

Natural Language Processing

Goodbye RAG? Gemini 2.0 Flash Have Just Killed It!

Alright!!!

10x Cheaper PDF Processing: Ingesting and RAG on Millions of Documents with Gemini 2.0 Flash

Picture this: you start by converting every PDF page into images, then send them off for OCR, only to wrestle the raw text into workable…

Advanced RAG 02: Unveiling PDF Parsing

Including key points, diagrams, and code

Part 4: Chatting about company documents using RAG and Spring AI

A Step-by-Step Guide to Implementing RAG in Spring AI