Blogs

Enhancing Language Models with RAG in Python

Posted by Jesse JCharis

Feb. 23, 2025, 5:39 a.m.


Enhancing Language Models with RAG in Python


Large Language Models (LLMs) like GPT-4 and BART excel at generating human-like text but often struggle with factual accuracy or accessing up-to-date information not present in their training data. Retrieval-Augmented Generation (RAG) addresses this limitation by combining retrieval from external knowledge sources with generative models to produce more accurate and contextually relevant outputs.

In this guide, we’ll implement a RAG system in Python using Hugging Face’s transformerssentence-transformers, and Facebook’s FAISS for efficient similarity search.


What Is Retrieval-Augmented Generation (RAG)?

RAG integrates two components:

  1. Retriever: Searches external datasets or knowledge bases to fetch relevant information.
  2. Generator: An LLM that synthesizes retrieved information into coherent answers.

This approach enhances LLMs by grounding responses in dynamically retrieved facts rather than relying solely on pre-trained knowledge.


Implementing RAG in Python

Step 1: Install Dependencies

!pip install transformers sentence-transformers faiss-cpu datasets

Step 2: Prepare Sample Data

For simplicity, we’ll use manually curated passages about animals:

passages = [
    "Elephants are large mammals known for their long trunks and tusks. The average lifespan of an African elephant is 60-70 years in the wild.",
    "Lions are big cats living in Africa and India. They survive 10-14 years in the wild.",
    "Penguins are flightless birds inhabiting Antarctica. Emperor penguins can live up to 20 years.",
]

Step 3: Encode Passages and Build a FAISS Index

We’ll use sentence-transformers to encode text into embeddings and FAISS for fast retrieval:

from sentence_transformers import SentenceTransformer
import faiss

# Encode passages into vectors
encoder = SentenceTransformer("all-MiniLM-L6-v2")  # Lightweight embedding model
embeddings = encoder.encode(passages)

# Create FAISS index
dim = embeddings.shape[1]
index = faiss.IndexFlatL2(dim)
index.add(embeddings)

Step 4: Define Retriever Function

This function retrieves top-k relevant passages for a query:

def retrieve(query_embedding, k=2):
    distances, indices = index.search(query_embedding, k)
    return [passages[i] for i in indices[0]]

Step 5: Initialize Generator Model

We’ll use Hugging Face’s BART model for text generation:

from transformers import pipeline

generator = pipeline(
    "text2text-generation",
    model="facebook/bart-large-cnn"  # Summarization-focused model
)

Step 6: Build the RAG Pipeline

Combine retrieval and generation into one workflow:

def rag_pipeline(query):
    # Retrieve relevant documents
    query_embedding = encoder.encode([query])
    retrieved_docs = retrieve(query_embedding)
    
    # Format input for generator
    context = " ".join(retrieved_docs)
    input_text = f"question: {query} context: {context}"
    
    # Generate answer
    answer = generator(input_text, max_length=100)
    return answer[0]["generated_text"]

Testing the RAG System

Let’s ask questions requiring factual knowledge:

Example 1: Lifespan of Elephants

print(rag_pipeline("What is the average lifespan of an African elephant?"))
# Output: "The average lifespan of an African elephant is 60-70 years in the wild."

Example 2: Lion Habitats

print(rag_pipeline("Where do lions live?"))
# Output: "Lions live in Africa and India."

Key Considerations & Enhancements

  1. Scalability: Use larger datasets (e.g., Wikipedia) and distributed vector databases like Pinecone.
  2. Better Embeddings: Replace all-MiniLM-L6-v2 with larger models (e.g., multi-qa-mpnet-base-dot-v1).
  3. Hybrid Retrieval: Combine dense vectors (FAISS) with sparse keyword matching (BM25) for robustness.
  4. Generator Choice: Experiment with models like FLAN-T5 or GPT-3 for nuanced answers.
  5. Post-Processing: Validate answers against retrieved documents to reduce hallucinations.

Conclusion

RAG systems empower LLMs to deliver accurate answers by leveraging external knowledge dynamically. This example demonstrates a basic implementation—real-world applications require optimizations like efficient indexing and advanced reranking strategies. By integrating retrieval with generation, you can build LLM-powered systems that stay current and factually grounded.

Next Steps: Explore advanced frameworks like LangChain or Haystack for production-ready RAG pipelines!

  • No tags associated with this blog post.

NLP Analysis
  • Sentiment: positive
  • Subjectivity: positive
  • Emotions: joy
  • Probability: {'anger': 3.08063725233163e-90, 'disgust': 7.283513173098999e-113, 'fear': 9.016783183507851e-82, 'joy': 1.0, 'neutral': 0.0, 'sadness': 2.396446247688772e-141, 'shame': 5.423267131777119e-191, 'surprise': 3.185575877364427e-82}
Comments
insert_chart