
Posted by Jesse JCharis
Feb. 23, 2025, 5:39 a.m.
Enhancing Language Models with RAG in Python
Large Language Models (LLMs) like GPT-4 and BART excel at generating human-like text but often struggle with factual accuracy or accessing up-to-date information not present in their training data. Retrieval-Augmented Generation (RAG) addresses this limitation by combining retrieval from external knowledge sources with generative models to produce more accurate and contextually relevant outputs.
In this guide, we’ll implement a RAG system in Python using Hugging Face’s transformers
, sentence-transformers
, and Facebook’s FAISS for efficient similarity search.
What Is Retrieval-Augmented Generation (RAG)?
RAG integrates two components:
- Retriever: Searches external datasets or knowledge bases to fetch relevant information.
- Generator: An LLM that synthesizes retrieved information into coherent answers.
This approach enhances LLMs by grounding responses in dynamically retrieved facts rather than relying solely on pre-trained knowledge.
Implementing RAG in Python
Step 1: Install Dependencies
!pip install transformers sentence-transformers faiss-cpu datasets
Step 2: Prepare Sample Data
For simplicity, we’ll use manually curated passages about animals:
passages = [
"Elephants are large mammals known for their long trunks and tusks. The average lifespan of an African elephant is 60-70 years in the wild.",
"Lions are big cats living in Africa and India. They survive 10-14 years in the wild.",
"Penguins are flightless birds inhabiting Antarctica. Emperor penguins can live up to 20 years.",
]
Step 3: Encode Passages and Build a FAISS Index
We’ll use sentence-transformers
to encode text into embeddings and FAISS for fast retrieval:
from sentence_transformers import SentenceTransformer
import faiss
# Encode passages into vectors
encoder = SentenceTransformer("all-MiniLM-L6-v2") # Lightweight embedding model
embeddings = encoder.encode(passages)
# Create FAISS index
dim = embeddings.shape[1]
index = faiss.IndexFlatL2(dim)
index.add(embeddings)
Step 4: Define Retriever Function
This function retrieves top-k relevant passages for a query:
def retrieve(query_embedding, k=2):
distances, indices = index.search(query_embedding, k)
return [passages[i] for i in indices[0]]
Step 5: Initialize Generator Model
We’ll use Hugging Face’s BART model for text generation:
from transformers import pipeline
generator = pipeline(
"text2text-generation",
model="facebook/bart-large-cnn" # Summarization-focused model
)
Step 6: Build the RAG Pipeline
Combine retrieval and generation into one workflow:
def rag_pipeline(query):
# Retrieve relevant documents
query_embedding = encoder.encode([query])
retrieved_docs = retrieve(query_embedding)
# Format input for generator
context = " ".join(retrieved_docs)
input_text = f"question: {query} context: {context}"
# Generate answer
answer = generator(input_text, max_length=100)
return answer[0]["generated_text"]
Testing the RAG System
Let’s ask questions requiring factual knowledge:
Example 1: Lifespan of Elephants
print(rag_pipeline("What is the average lifespan of an African elephant?"))
# Output: "The average lifespan of an African elephant is 60-70 years in the wild."
Example 2: Lion Habitats
print(rag_pipeline("Where do lions live?"))
# Output: "Lions live in Africa and India."
Key Considerations & Enhancements
- Scalability: Use larger datasets (e.g., Wikipedia) and distributed vector databases like Pinecone.
- Better Embeddings: Replace
all-MiniLM-L6-v2
with larger models (e.g.,multi-qa-mpnet-base-dot-v1
). - Hybrid Retrieval: Combine dense vectors (FAISS) with sparse keyword matching (BM25) for robustness.
- Generator Choice: Experiment with models like FLAN-T5 or GPT-3 for nuanced answers.
- Post-Processing: Validate answers against retrieved documents to reduce hallucinations.
Conclusion
RAG systems empower LLMs to deliver accurate answers by leveraging external knowledge dynamically. This example demonstrates a basic implementation—real-world applications require optimizations like efficient indexing and advanced reranking strategies. By integrating retrieval with generation, you can build LLM-powered systems that stay current and factually grounded.
Next Steps: Explore advanced frameworks like LangChain or Haystack for production-ready RAG pipelines!
No tags associated with this blog post.
Recent Posts
NLP Analysis
- Sentiment: positive
- Subjectivity: positive
- Emotions: joy
- Probability: {'anger': 3.08063725233163e-90, 'disgust': 7.283513173098999e-113, 'fear': 9.016783183507851e-82, 'joy': 1.0, 'neutral': 0.0, 'sadness': 2.396446247688772e-141, 'shame': 5.423267131777119e-191, 'surprise': 3.185575877364427e-82}