Document Review and Fact-Checking Agent

I designed an internal document-review and fact-checking system for comparing claims against source evidence across mixed document formats.

The system ingests PDF, Word, PowerPoint, HTML, TXT, Markdown, and CSV files; extracts atomic claims; embeds them with Qwen3 embeddings; stores them in Milvus; and performs bidirectional claim-evidence retrieval.

What It Detects

  • Contradictions between claims and source evidence.
  • Missing evidence and unsupported claims.
  • Logic changes between related statements.
  • Self-contradictions inside a document.
  • Pair contradictions across documents.
  • Conditional conflicts that need context-sensitive validation.

Approach

The workflow combines rule-based checks, NLI-style validation, LLM context validation, vector retrieval, and structured ranking. This makes it more useful than a simple “ask the PDF” workflow: the system can surface review targets, explain why they matter, and prioritize likely issues.

Technologies

Python, LangChain, local LLMs, Qwen3 embeddings, Milvus, document parsing, information retrieval, NLI, and RAG.