Building a Self-Correcting Fact-Checker with DSPy
LLMs can sound confident even when they’re wrong. Retrieval‑Augmented Generation (RAG) helps, but it doesn’t guarantee the model will stick to the retrieved evidence. In this post, we’ll build a self‑correcting fact‑checker in DSPy that:
- retrieves evidence from Wikipedia,
- answers only from that evidence, and
- verifies the answer.
If verification flags unsupported claims, the system automatically retries with feedback until the answer is fully supported, using dspy.Refine. The entire code sample is on GitHub.
Why DSPy for self‑correction?
DSPy lets you compose LM programs from small modules (e.g., ChainOfThought), then add search, verification, and retry as code - not prompt hacks. We configure OpenAI’s gpt-4o-mini in one line and use Refine to drive iterative improvement until a reward function says we’re done. (DSPy)
What we’ll build
-
Retriever:
WikipediaRetrieverqueries the Wikipedia API and returns passages.Note: DSPy’s
Retrieveexpects the retriever to return items with a.long_textattribute (we usedotdict). (GitHub) -
Generator:
GenerateAnswerproduces a concise answer only from the given context. -
Verifier:
VerifyAnswerlists any claims not supported by the context. -
Refiner:
dspy.Refinewraps the generator and automatically retries with feedback until the verifier returns “None” (no unsupported claims).
Setup
pip install dspy-ai wikipedia
# Set your key:
# Windows (persist): setx OPENAI_API_KEY "sk-..."
# PowerShell (session): $env:OPENAI_API_KEY="sk-..."
# macOS/Linux: export OPENAI_API_KEY="sk-..."
The full script
See fact_check_rag.py in the GitHub repository. Key parts:
1) Configure the LM and retriever
lm = dspy.LM("openai/gpt-4o-mini", api_key=os.environ.get("OPENAI_API_KEY"))
dspy.configure(lm=lm) # sets the default LM globally
wiki_rm = WikipediaRetriever(max_chars_per_passage=1500, language="en")
dspy.settings.configure(rm=wiki_rm)
2) Retrieval → Generation → Verification
self.retrieve = dspy.Retrieve(k=4)calls our retriever and returns the passages.self.generate_answer = dspy.ChainOfThought(GenerateAnswer)writes the answer.self.verify_answer = dspy.ChainOfThought(VerifyAnswer)checks support.
3) Self‑correction with Refine
We define a small reward function that calls the verifier. If it returns “None”, the reward is 1.0 and Refine stops; otherwise, Refine injects feedback and retries up to N times:
self.refine_generate = dspy.Refine(
module=self.generate_answer,
N=max_attempts,
reward_fn=reward_fn,
threshold=1.0,
)
Running it
Try:
“When did Apollo 11 land on the Moon, and who were the astronauts involved?”
You’ll see:
- Final Answer - concise and grounded in the retrieved Wikipedia passages.
- Unsupported Claims - should print
Noneif verification passes. - Context used - the exact passages (with titles + URLs) that grounded the answer.
Customize
- Swap the question for any topic Wikipedia covers well.
- Adjust
k_passages(retrieval breadth) andmax_attempts(strictness). - Adapt the verifier’s wording if you want stricter or looser checks.
Takeaways
- Separate concerns: retrieval (evidence), generation (answer), verification (checks).
- Make it self‑correcting: wrap generation with
Refineand a simple reward. - Mind the interface: custom retrievers should return objects with
.long_textfor smooth integration withdspy.Retrieve. (GitHub)