Cobb County Building & Fire Code Agentic RAG Assistant

Agentic RAG LangChain Chroma BM25 Hybrid Search Query Expansion PyPDF Docling Streamlit Docker LangSmith

This project is an agentic retrieval-augmented generation (RAG) application for Cobb County, Georgia building and fire code questions. It uses local PDF documents as the primary knowledge base, supports four retrieval configurations, and falls back to official web sources when local evidence is weak or when a question asks about current, adopted, recently changed, or effective-date information.

The system demonstrates production-minded RAG engineering: document ingestion, selectable retrieval backends, vector and hybrid search, LLM routing, deterministic context expansion, strict evidence gating, source-grounded synthesis, web verification, LangSmith evaluation, Docker support, and a recruiter-friendly Streamlit interface.

Background & Problem Statement

Building and fire code information is difficult to search because it is spread across ordinances, code PDFs, county forms, permit guidance, plan-review checklists, Fire Marshal documents, and state-level code references. A homeowner, contractor, reviewer, or administrative user may know the question they want to ask, but not the exact document, section, or page where the answer appears.

Problem Statement: Can a deployable RAG assistant help users ask natural-language questions about Cobb County code and permitting documents while keeping answers grounded in cited local documents and current web evidence when verification is needed?

Streamlit Chat Interface

The front end is a Streamlit chat application designed for portfolio review and technical inspection. Users submit a code or permitting question, then receive a concise answer with source references and an answer-source label. The Settings & Eval dashboard lets users switch between all four retrieval options and view cached LangSmith evaluation metrics without restarting the app.

Document Corpus & Data Policy

The local corpus is designed around Cobb County, Georgia building and fire code research. It includes county ordinance PDFs, Fire Marshal forms and checklists, building permit and tenant build-out guidance, fire inspection documents, hydrant and sprinkler resources, emergency equipment guidance, and Georgia code references.

  • Local PDFs used during development: 41 files.
  • Raw PDF size: approximately 60 MB.
  • Loaded pages: 4,093+ pages.
  • Vector chunks: 13,844 chunks stored in Chroma.
  • Repository policy: raw PDFs are excluded, while the generated vectorstore and BM25 artifacts are tracked with Git LFS for deployment.

Supported Retrieval Backends

Backend UI Label Retrieval Strategy
cobb_code_docs_original Option 1: PyPDF + Chromadb Original PyPDF/LangChain page extraction with Chroma vector search
cobb_code_docs_docling Option 2: Docling + Chromadb Docling layout-aware Markdown conversion with Chroma vector search
docling_chroma_bm25_hybrid Option 3: Docling + Chroma + BM25 Hybrid Search Docling chunks searched with both Chroma vector retrieval and a persisted local BM25 keyword corpus
docling_chroma_bm25_expansion Option 4: Docling + Query Expansion + BM25 Hybrid Search Reuses Option 3 indexes, expands the original question into five retrieval queries, deduplicates results, and fuses rankings

Agentic RAG Architecture

The app uses an agentic RAG workflow rather than a single prompt-only LLM call. A lightweight LLM router evaluates the question, the selected retrieval backend searches local evidence, deterministic neighbor expansion keeps nearby checklist items visible, an evidence check determines whether the local context is sufficient, and web search is triggered when current-code verification or weak local evidence requires it.

  • Settings selector: switches new questions across four retrieval configurations without requiring an app restart.
  • Query routing: flags whether the question needs local documents, web search, or both.
  • Local retrieval: uses Chroma vector search or Chroma + BM25 hybrid retrieval depending on the selected option.
  • Query expansion: Option 4 uses one LLM call to create four additional retrieval queries before hybrid retrieval.
  • Deterministic context expansion: expands each retrieved chunk with same-document neighboring chunks chunk_index - 1 and chunk_index + 1.
  • Strict evidence gate: requires exact supporting evidence for numeric, code, inspection, permit, and procedural answers.
  • Web fallback: prioritizes public county and state web sources for current-code verification.

Retrieval Engineering

This project does not train a conventional tabular ML model. Instead, the engineering work focuses on building reliable retrieval over long, heterogeneous PDF documents where exact numbers, section references, conditions, and procedural language matter. The final app compares vector-only retrieval against local hybrid retrieval and query-expansion retrieval.

  • Original parsing: preserves the initial PyPDF/LangChain page extraction pipeline.
  • Docling parsing: converts layout-aware PDF content to Markdown before chunking, which can help with headings, tables, sections, and multi-column documents.
  • BM25 hybrid retrieval: combines keyword matching with dense vector retrieval using Reciprocal Rank Fusion.
  • Query expansion: generates four additional technical and step-back queries, retrieves with all five queries, deduplicates, and fuses rankings.
  • Context expansion: retrieves small chunks for search quality, then expands each hit with only same-document neighboring chunks before adequacy checking.
  • Metadata tracking: file name, source path, parser type, backend, chunk index, section, and page range are retained where available.
  • Evidence thresholding: weak retrieval or current-code language can trigger web search or conservative abstention.

LangSmith Tracing, Evaluation & Validation

LangSmith traces were used to inspect the full chain input, routing context, local PDF evidence, web-search evidence, token usage, latency, and final answers. The project uses a fixed 50-question golden evaluation set and cached LangSmith experiment scores for faithfulness, answer relevance, context precision, context recall, and latency.

Evaluation Methodology

The Settings & Eval dashboard compares all four retrieval configurations using the same fixed 50-question golden dataset. Metrics are persisted under eval_results/ so the dashboard can load results immediately without automatically launching a new evaluation run.

  • Faithfulness: whether all answer claims are supported solely by the retrieved context.
  • Answer relevance: whether the generated response directly answers the user's question.
  • Context precision: whether retrieved context is useful and well-ranked rather than noisy.
  • Context recall: whether the retrieved context contains the necessary facts compared with the ground-truth answer.
  • Latency: mean, P50, and P99 runtime in seconds across the fixed evaluation set.

Results & System Checks

The project was validated through ingestion checks, retrieval smoke tests, current-code routing tests, web search fallback tests, syntax checks, Docker support, manual review of representative questions, and LangSmith evaluation across four retrieval options.

Retrieval Method Performance Comparison

Option Method Answer Relevance Context Precision Context Recall Faithfulness Avg Latency (s) P50 (s) P99 (s)
1 PyPDF + Chromadb 0.695 0.695 0.610 0.940 15.3 10.4 108.8
2 Docling + Chromadb 0.625 0.650 0.615 0.930 13.6 12.2 36.1
3 Docling + Chroma + BM25 Hybrid Search 0.725 0.765 0.720 0.955 15.4 11.4 69.0
4 Docling + Query Expansion + BM25 Hybrid Search 0.730 0.755 0.645 0.965 23.4 21.3 50.2

Best balanced configuration: Option 3 is the strongest practical default because it produces the best context precision and context recall while avoiding the extra query-expansion latency of Option 4. Highest faithfulness: Option 4 reaches the top faithfulness score, but the added query-expansion LLM call increases median latency and does not improve context recall over Option 3.

Validation Summary

Test Area Result Notes
PDF ingestion Passed Loaded Cobb County and Georgia code PDFs
Vector index build Passed Indexed 13,844 chunks into Chroma
Docling-enhanced indexing Added Builds cobb_code_docs_docling from Docling Markdown output
BM25 hybrid retrieval Added Persists a local BM25 corpus and fuses keyword and vector rankings
Query expansion retrieval Added Creates five total retrieval queries before hybrid ranking
Deterministic context expansion Added Expands retrieved chunks with same-document -1/+1 neighbors before evidence checks
Strict evidence gate Added Requires exact support before numeric, code, inspection, permit, or procedural answers
LangSmith evaluation cache Added Saves per-backend metrics under eval_results/
Web fallback Passed SerpAPI Google Search works from the app environment
Deployment support Included Streamlit Community Cloud, Dockerfile, and docker-compose.yml

Key Insights

  • Hybrid retrieval mattered most: Docling alone did not outperform the PyPDF baseline, but Docling paired with BM25 improved context precision and recall.
  • Faithfulness improved through stricter evidence gating: the app refuses numeric, code, permit, inspection, and procedural answers unless exact support is visible in the supplied context.
  • Neighbor expansion reduced false refusals: adding same-document neighboring chunks helps when a retrieved chunk lands just before an answer-bearing checklist item or table row.
  • Query expansion has a cost: Option 4 produced the highest faithfulness and answer relevance, but at a substantially higher median latency.
  • Recommended demo configuration: Option 3 gives the best retrieval-quality tradeoff for portfolio demonstrations.

Applied ML Engineering Value

This project demonstrates how modern LLM systems can be adapted to document-heavy public-sector workflows where answers must be grounded, current, and easy to verify. It is especially relevant to customer-facing AI platforms, conversational question answering, search and retrieval, agent workflows, and intelligent automation.

  • RAG system design: combines selectable retrieval backends, local retrieval, hybrid search, routing, evidence checking, and web fallback.
  • Operational grounding: returns source references instead of unsupported open-ended answers.
  • Production mindset: includes reproducible ingestion, Docker support, environment templates, Git LFS artifacts, and hosted app deployment.
  • Evaluation and observability: uses LangSmith traces and fixed golden-set metrics to inspect retrieval quality, routing decisions, latency, and generated outputs.

GitHub Repository & Live Demo

The full implementation and hosted Streamlit demo are available through the links below.

🔗 View Project Repository on GitHub

🚀 Open Live Demo

The repository includes the Streamlit app, LangChain agent workflow, Chroma retriever, BM25 hybrid retrieval, query expansion retrieval, PyPDF and Docling ingestion scripts, deterministic context expansion, LangSmith evaluation workflow, Docker files, environment template, and documentation for rebuilding local vector and BM25 indexes.

Disclaimer: this is a portfolio demonstration and educational project. It is not legal, engineering, building code, fire code, or permitting advice. Users should verify requirements directly with Cobb County, the Georgia Department of Community Affairs, the State Fire Marshal, and the authority having jurisdiction.