AIXplorethe lab
AI Systems & Architecture3 min readshipped

Building a Markdown RAG System: A Practical Guide to Document-Grounded AI

Building a Markdown RAG System: A Practical Guide to Document-Grounded AI

Overview
Learn how to build a lightweight yet powerful RAG system that enables natural language interactions with markdown documentation. This implementation focuses on simplicity and effectiveness, avoiding the complexity of vector databases while maintaining high-quality responses.

System Architecture

The RAG system consists of four main components working together to provide document-grounded AI responses:

  1. Document Processor: Parses and chunks markdown files
  2. Search Engine: Finds relevant document sections
  3. LLM Integration: Generates contextual responses
  4. Chat Interface: Handles user interactions
flowchart TD
    User[User] --> |Question| Chat[Chat Interface]
    Chat --> |Query| Search[Search Engine]
    Search --> |Retrieves| Docs[Markdown Documents]
    Search --> |Results| LLM[LLM Processor]
    LLM --> |Response| Chat

Document Processing

Intelligent Chunking

The system uses a sophisticated chunking strategy that preserves document context:

interface ChunkOptions {
  maxChunkSize: number;
  overlapSize: number;
}

function chunkDocument(content: string, options: ChunkOptions): string[] {
  const { maxChunkSize, overlapSize } = options;
  const chunks: string[] = [];
  
  // Split by paragraphs
  const paragraphs = content.split(/\n\s*\n/);
  let currentChunk = '';
  
  for (const paragraph of paragraphs) {
    if (currentChunk.length + paragraph.length > maxChunkSize && currentChunk.length > 0) {
      chunks.push(currentChunk);
      // Keep overlap for context
      const lastParagraphs = currentChunk
        .split(/\n\s*\n/)
        .slice(-3)
        .join('\n\n');
      
      currentChunk = lastParagraphs.length <= overlapSize 
        ? lastParagraphs + '\n\n' 
        : '';
    }
    
    currentChunk += paragraph + '\n\n';
  }
  
  return chunks;
}

Metadata Extraction

Each document chunk includes rich metadata for better context:

interface DocumentChunk {
  id: string;
  path: string;
  title: string;
  content: string;
  metadata: Record<string, any>;
  headings: string[];
  tokens: string[];
  createdAt: Date;
  updatedAt: Date;
}

Search Implementation

Instead of using complex vector embeddings, we implement a hybrid search approach combining:

  1. Fuzzy Matching: Using Fuse.js for typo tolerance
  2. Term Frequency: For content relevance scoring
  3. Phrase Matching: For exact matches
class SearchEngine {
  private calculateRelevance(query: string[], document: DocumentChunk): number {
    const tfScore = this.calculateTF(query, document);
    const phraseScore = this.phraseMatchBoost(query.join(' '), document);
    const headingScore = this.headingMatchScore(query, document);
    
    return (tfScore * 0.6) + (phraseScore * 0.3) + (headingScore * 0.1);
  }
  
  search(query: string, limit: number = 5): SearchResult[] {
    const queryTokens = tokenize(query);
    const results = this.documents.map(doc => ({
      document: doc,
      score: this.calculateRelevance(queryTokens, doc)
    }));
    
    return results
      .filter(result => result.score > THRESHOLD)
      .sort((a, b) => b.score - a.score)
      .slice(0, limit);
  }
}

LLM Integration

The system uses a carefully crafted prompt structure to ensure high-quality, grounded responses:

function buildSystemPrompt(relevantDocs: DocumentChunk[]): string {
  const contextSections = relevantDocs.map(doc => `
Source: ${doc.path}
Title: ${doc.title}
Content:
${doc.content}
---`).join('\n\n');
  
  return `You are an AI assistant specialized in answering questions about the documentation.
Answer questions based ONLY on the following information:

${contextSections}

Guidelines:
1. Use ONLY the provided information
2. If information is not in context, acknowledge the limitation
3. Cite sources when possible
4. Format responses in Markdown
5. Be concise but thorough
6. Explain technical terms`;
}

Context Management

The system maintains conversation history while keeping context relevant:

function buildPromptWithRecentContext(
  messages: Message[],
  relevantDocs: DocumentChunk[]
): { systemPrompt: string; userMessages: Message[] } {
  const systemPrompt = buildSystemPrompt(relevantDocs);
  const recentMessages = messages.slice(-6); // Keep conversation focused
  
  return { systemPrompt, userMessages: recentMessages };
}
Performance Optimization
The system uses in-memory caching and document reprocessing strategies to maintain fast response times without sacrificing quality:
let documentsCache: DocumentChunk[] = [];
let lastProcessed: Date | null = null;

async function getProcessedDocuments() {
  const shouldReprocess = 
    documentsCache.length === 0 || 
    !lastProcessed ||
    (new Date().getTime() - lastProcessed.getTime() > 3600000);
  
  if (shouldReprocess) {
    documentsCache = await processMarkdownDocuments(MARKDOWN_DIRECTORIES);
    lastProcessed = new Date();
  }
  
  return documentsCache;
}

User Interface

The system provides a clean, responsive chat interface with:

  1. Markdown Rendering: For formatted responses
  2. Source Citations: Linking to original documents
  3. Context Awareness: Understanding conversation flow
function ChatMessage({ message }: ChatMessageProps) {
  return (
    <div className="flex justify-start">
      <div className="max-w-[80%] rounded-lg p-3 bg-gray-100">
        <MarkdownRenderer content={message.content} />
        
        {message.sources && (
          <div className="mt-2 pt-2 border-t">
            <div className="text-xs font-medium">Sources:</div>
            {message.sources.map(source => (
              <SourceCitation key={source.chunkId} source={source} />
            ))}
          </div>
        )}
      </div>
    </div>
  );
}

Key Benefits

  1. Simplicity: No complex vector database setup
  2. Performance: Fast in-memory search and caching
  3. Accuracy: Document-grounded responses
  4. Maintainability: Easy to update and extend
  5. Integration: Works with existing markdown docs
Limitations & Considerations
- Search is keyword-based, may miss semantic relationships - In-memory processing limits document scale - No persistent conversation storage - Requires careful prompt engineering

Future Enhancements

  1. Vector Search: Add optional embedding-based search
  2. Conversation Storage: Implement persistence
  3. Streaming Responses: Improve response UX
  4. Advanced Context: Add semantic understanding
  5. Multi-Modal: Support for images and diagrams

Conclusion

This lightweight RAG implementation demonstrates that you can build a powerful document-grounded AI system without complex infrastructure. By focusing on practical search strategies and careful prompt engineering, we achieve high-quality responses while maintaining simplicity and performance.


Related Articles

  • Implementing Model Context Protocol (MCP) Across AI Coding Assistants
  • DGX Spark: Week One Update - Finding the Right Stack
  • Inside Manus.im: The Elegant Architecture Behind a Powerful AI Agent

About the Author: Justin Johnson builds AI systems and writes about practical AI development.

justinhjohnson.com | Twitter | LinkedIn | Run Data Run | Subscribe

Follow the lab

Get the next experiment

Enjoyed the breakdown on Building a Markdown RAG System: A Practical Guide to Document-Grounded AI? New entries land roughly weekly. No digest, no roundup. Just the next build log, when it ships.