Cobb County Building & Fire Code Agentic RAG Assistant
This project is an agentic retrieval-augmented generation (RAG) application for Cobb County, Georgia building and fire code questions. It uses local PDF documents as the primary knowledge base, then falls back to official web sources when the local evidence is weak or when a question asks about current, adopted, recently changed, or effective-date information.
The goal is to demonstrate a production-minded RAG workflow: document ingestion, chunking, vector search, LLM-assisted routing, source-grounded synthesis, web verification, observability, Docker support, and a recruiter-friendly Streamlit interface.
Background & Problem Statement
Building and fire code information is difficult to search because it is spread across ordinances, code PDFs, county forms, permit guidance, checklists, and state-level code references. A homeowner, contractor, reviewer, or administrative user may know the question they want to ask, but not the exact document, section, or page where the answer appears.
Problem Statement: Can a deployable RAG assistant help users ask natural-language questions about Cobb County code and permitting documents while keeping answers grounded in cited local documents and current web evidence when verification is needed?
Streamlit Chat Interface
The front end is a simple Streamlit chat application designed for fast review by recruiters, technical interviewers, and portfolio visitors. Users submit a code or permitting question, then receive a concise 2-3 paragraph answer with source references and a visible answer-source label.
Document Corpus & Data Policy
The local corpus is designed around Cobb County, Georgia building and fire code research. It includes county ordinance PDFs, Fire Marshal forms and checklists, building permit guidance, tenant build-out documents, fire inspection documents, hydrant and sprinkler resources, emergency equipment guidance, and Georgia code references.
- Local PDFs used during development: 41 files.
- Raw PDF size: approximately 60 MB.
- Loaded pages: 4,093+ pages.
- Vector chunks: 13,844 chunks stored in Chroma.
- Repository policy: raw PDFs are excluded, while the generated vectorstore is tracked with Git LFS for Streamlit Community Cloud deployment.
Agentic RAG Architecture
The app uses an agentic RAG workflow rather than a single prompt-only LLM call. A lightweight LLM router evaluates the question, local retrieval searches the Chroma vector database, an evidence check decides whether local context is sufficient, and web search is triggered when the question requires current verification or the retrieved local evidence is incomplete.
- Query routing: flags whether the question needs local documents, web search, or both.
- Local retrieval: searches embedded Cobb County PDF chunks with file and page metadata.
- Evidence checking: evaluates whether the retrieved passages clearly answer the question.
- Web fallback: prioritizes public county and state web sources for current-code verification.
- Guardrails: avoids legal, engineering, or permit-guarantee advice and uses conservative language.
Retrieval Engineering
This project does not train a conventional tabular ML model. Instead, the engineering work focuses on building a reliable retrieval system over long, heterogeneous PDF documents. The key design requirement is preserving enough metadata for users to verify where an answer came from.
- PDF parsing: documents are loaded page by page.
- Text chunking: pages are split into overlapping chunks so code context is not lost.
- Metadata tracking: file name, source path, and page number are retained.
- Embeddings: each chunk is converted into a vector representation.
- Vector indexing: chunks and metadata are stored in Chroma for semantic search.
- Fallback thresholding: weak retrieval or current-code language can trigger web search.
LangSmith Tracing & Validation
LangSmith traces were used to inspect the full chain input, routing context, local PDF evidence, web-search evidence, token usage, latency, and final answers. This helped verify that the app was not just producing plausible responses, but using the intended retrieval and fallback workflow.
Results & System Checks
The project was validated through ingestion checks, retrieval smoke tests, current-code routing tests, web search fallback tests, syntax checks, Docker support, and manual review of representative questions.
Validation Summary
| Test Area | Result | Notes |
|---|---|---|
| PDF ingestion | Passed | Loaded Cobb County and Georgia code PDFs |
| Vector index build | Passed | Indexed 13,844 chunks into Chroma |
| Local retrieval smoke test | Passed | Retrieved relevant fire inspection and code sources |
| LLM query router | Passed | Flags current, dated, adopted-code, and fee-schedule questions for verification |
| Web fallback | Passed | SerpAPI Google Search works from the app environment |
| Deployment support | Included | Streamlit Community Cloud, Dockerfile, and docker-compose.yml |
Applied ML Engineering Value
This project demonstrates how modern LLM systems can be adapted to document-heavy public-sector workflows where answers must be grounded, current, and easy to verify. It is especially relevant to customer-facing AI platforms, conversational question answering, search and retrieval, agent workflows, and intelligent automation.
- RAG system design: combines local retrieval, routing, evidence checking, and web fallback.
- Operational grounding: returns source references instead of unsupported open-ended answers.
- Production mindset: includes reproducible ingestion, Docker support, environment templates, and hosted app deployment.
- Observability: uses LangSmith traces to inspect routing decisions and generated outputs.
GitHub Repository & Live Demo
The full implementation and hosted Streamlit demo are available through the links below.
🔗 View Project Repository on GitHub
The repository includes the Streamlit app, LangChain agent workflow, Chroma retriever, ingestion scripts, Docker files, environment template, and documentation for rebuilding the local vector index.
Disclaimer: this is a portfolio demonstration and educational project. It is not legal, engineering, building code, fire code, or permitting advice. Users should verify requirements directly with Cobb County, the Georgia Department of Community Affairs, the State Fire Marshal, and the authority having jurisdiction.