Member-only story

A quick start to RAG with a local setup

Published in

FAUN — Developer Community 🐾

4 min readOct 19, 2024

To read this story for free check out this link.

Imagine heading over to https://chatgpt.com/ and asking ChatGPT a bunch of questions. A pretty good way to pass a hot and humid sunday afternoon if you ask me. What if you had a bunch of documents you wanted to decipher? Perhaps they are your lecture notes from CS2040. Now ask the LLM a question: “What did the professor highlight about linked lists in Lecture 4?”

The model spits out a bunch of random information. Let’s say someone magically types in some information (read: context) to the model to help with this.

You have to admit it’s naive to always provide the model with context. Is someone always going to have to type this out? Well you are in luck! Retrieval-Augmented Generation does just that! The idea is that a query is vectorised and used to search against a pre-vectorised set of information in a database to retrieve the top few matches based on a similarity algorithm. These matches are returned to the LLM as context for it to answer questions. What are we waiting for? Let’s get started!

Start Ollama

First let’s get the Ollama Server up and running. Ollama is a tool that allows us to load models locally for testing.

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

You will know this is successful when you head over to localhost:11434and see the words “Ollama is running”.

Start a python environment

Any python environment is fine but I personally enjoy using the langchain image.

docker pull langchain/langchain
docker run -it --network host langchain/langchain sh

Install dependencies

apt update
apt upgrade -y
apt install vim tmux -y

FAUN — Developer Community 🐾

A quick start to RAG with a local setup

Start Ollama

Start a python environment

Install dependencies

Create a working directory

Published in FAUN — Developer Community 🐾

Written by D

No responses yet