Building a Markdown RAG System: A Practical Guide to Document-Grounded AI
Building a Markdown RAG System: A Practical Guide to Document-Grounded AI
System Architecture
The RAG system consists of four main components working together to provide document-grounded AI responses:
- Document Processor: Parses and chunks markdown files
- Search Engine: Finds relevant document sections
- LLM Integration: Generates contextual responses
- Chat Interface: Handles user interactions
flowchart TD
User[User] --> |Question| Chat[Chat Interface]
Chat --> |Query| Search[Search Engine]
Search --> |Retrieves| Docs[Markdown Documents]
Search --> |Results| LLM[LLM Processor]
LLM --> |Response| Chat
Document Processing
Intelligent Chunking
The system uses a sophisticated chunking strategy that preserves document context:
interface ChunkOptions {
maxChunkSize: number;
overlapSize: number;
}
function chunkDocument(content: string, options: ChunkOptions): string[] {
const { maxChunkSize, overlapSize } = options;
const chunks: string[] = [];
// Split by paragraphs
const paragraphs = content.split(/\n\s*\n/);
let currentChunk = '';
for (const paragraph of paragraphs) {
if (currentChunk.length + paragraph.length > maxChunkSize && currentChunk.length > 0) {
chunks.push(currentChunk);
// Keep overlap for context
const lastParagraphs = currentChunk
.split(/\n\s*\n/)
.slice(-3)
.join('\n\n');
currentChunk = lastParagraphs.length <= overlapSize
? lastParagraphs + '\n\n'
: '';
}
currentChunk += paragraph + '\n\n';
}
return chunks;
}
Metadata Extraction
Each document chunk includes rich metadata for better context:
interface DocumentChunk {
id: string;
path: string;
title: string;
content: string;
metadata: Record<string, any>;
headings: string[];
tokens: string[];
createdAt: Date;
updatedAt: Date;
}
Search Implementation
Instead of using complex vector embeddings, we implement a hybrid search approach combining:
- Fuzzy Matching: Using Fuse.js for typo tolerance
- Term Frequency: For content relevance scoring
- Phrase Matching: For exact matches
class SearchEngine {
private calculateRelevance(query: string[], document: DocumentChunk): number {
const tfScore = this.calculateTF(query, document);
const phraseScore = this.phraseMatchBoost(query.join(' '), document);
const headingScore = this.headingMatchScore(query, document);
return (tfScore * 0.6) + (phraseScore * 0.3) + (headingScore * 0.1);
}
search(query: string, limit: number = 5): SearchResult[] {
const queryTokens = tokenize(query);
const results = this.documents.map(doc => ({
document: doc,
score: this.calculateRelevance(queryTokens, doc)
}));
return results
.filter(result => result.score > THRESHOLD)
.sort((a, b) => b.score - a.score)
.slice(0, limit);
}
}
LLM Integration
The system uses a carefully crafted prompt structure to ensure high-quality, grounded responses:
function buildSystemPrompt(relevantDocs: DocumentChunk[]): string {
const contextSections = relevantDocs.map(doc => `
Source: ${doc.path}
Title: ${doc.title}
Content:
${doc.content}
---`).join('\n\n');
return `You are an AI assistant specialized in answering questions about the documentation.
Answer questions based ONLY on the following information:
${contextSections}
Guidelines:
1. Use ONLY the provided information
2. If information is not in context, acknowledge the limitation
3. Cite sources when possible
4. Format responses in Markdown
5. Be concise but thorough
6. Explain technical terms`;
}
Context Management
The system maintains conversation history while keeping context relevant:
function buildPromptWithRecentContext(
messages: Message[],
relevantDocs: DocumentChunk[]
): { systemPrompt: string; userMessages: Message[] } {
const systemPrompt = buildSystemPrompt(relevantDocs);
const recentMessages = messages.slice(-6); // Keep conversation focused
return { systemPrompt, userMessages: recentMessages };
}
let documentsCache: DocumentChunk[] = [];
let lastProcessed: Date | null = null;
async function getProcessedDocuments() {
const shouldReprocess =
documentsCache.length === 0 ||
!lastProcessed ||
(new Date().getTime() - lastProcessed.getTime() > 3600000);
if (shouldReprocess) {
documentsCache = await processMarkdownDocuments(MARKDOWN_DIRECTORIES);
lastProcessed = new Date();
}
return documentsCache;
}
User Interface
The system provides a clean, responsive chat interface with:
- Markdown Rendering: For formatted responses
- Source Citations: Linking to original documents
- Context Awareness: Understanding conversation flow
function ChatMessage({ message }: ChatMessageProps) {
return (
<div className="flex justify-start">
<div className="max-w-[80%] rounded-lg p-3 bg-gray-100">
<MarkdownRenderer content={message.content} />
{message.sources && (
<div className="mt-2 pt-2 border-t">
<div className="text-xs font-medium">Sources:</div>
{message.sources.map(source => (
<SourceCitation key={source.chunkId} source={source} />
))}
</div>
)}
</div>
</div>
);
}
Key Benefits
- Simplicity: No complex vector database setup
- Performance: Fast in-memory search and caching
- Accuracy: Document-grounded responses
- Maintainability: Easy to update and extend
- Integration: Works with existing markdown docs
Future Enhancements
- Vector Search: Add optional embedding-based search
- Conversation Storage: Implement persistence
- Streaming Responses: Improve response UX
- Advanced Context: Add semantic understanding
- Multi-Modal: Support for images and diagrams
Conclusion
This lightweight RAG implementation demonstrates that you can build a powerful document-grounded AI system without complex infrastructure. By focusing on practical search strategies and careful prompt engineering, we achieve high-quality responses while maintaining simplicity and performance.
Related Articles
- Implementing Model Context Protocol (MCP) Across AI Coding AssistantsshippedAI Systems & ArchitectureMar 22, 2025Implementing Model Context Protocol (MCP) Across AI Coding AssistantsComprehensive guide to implementing Model Context Protocol (MCP) across different AI coding assistants with practical examples and best practices.
- DGX Spark: Week One Update - Finding the Right StackshippedAI Systems & ArchitectureOct 28, 2025DGX Spark: Week One Update - Finding the Right StackSystematic debugging reveals configuration fixes that transformed DGX Spark performance from frustrating to transformative with 3.6x speedups.
- Inside Manus.im: The Elegant Architecture Behind a Powerful AI AgentshippedAI Systems & ArchitectureMar 27, 2025Inside Manus.im: The Elegant Architecture Behind a Powerful AI AgentTechnical deep dive into the system architecture of Manus.im, revealing how elegant prompt engineering and tool design enable autonomous capabilities.
About the Author: Justin Johnson builds AI systems and writes about practical AI development.
justinhjohnson.com | Twitter | LinkedIn | Run Data Run | Subscribe
Follow the lab
Get the next experiment
Enjoyed the breakdown on Building a Markdown RAG System: A Practical Guide to Document-Grounded AI? New entries land roughly weekly. No digest, no roundup. Just the next build log, when it ships.