Practical ApplicationsOctober 19, 202510 min readshipped

Building a Production ML Workspace: Part 4 - Production-Ready AI Agent Templates

You've organized your workspace, built documentation systems, and implemented experiment tracking. Your ML workflows are reproducible and well-documented. Now it's time to tackle the most complex artifact in your workspace: AI agents.

Unlike experiments (which run once) or models (which are static), agents are dynamic systems that interact with tools, maintain state, and make decisions. Without proper structure, agent development quickly becomes tangled code with unclear responsibilities and impossible debugging.

This article shows you how to build production-ready AI agents using standardized templates, clear architecture patterns, comprehensive testing, and deployment readiness frameworks.

About This Series

This is Part 4 of a 5-part series on building production ML workspaces. Previous parts:

Coming next:

Part 5: Ollama Model Management and Workflow Integration

The Agent Development Problem

AI agents present unique challenges that experiments and models don't:

Complexity Challenges:

Multiple components (LLM, tools, memory, state)
Complex interaction patterns
Error handling across tool calls
State management and conversation history
Tool orchestration and chaining

Production Challenges:

How do you test an agent thoroughly?
How do you debug multi-step reasoning?
How do you version control prompts?
How do you monitor production behavior?
How do you handle tool failures gracefully?

Organizational Challenges:

Prototypes that never reach production
Unclear distinction between experimental and production code
Lack of reusable components
No standardized agent structure
Difficulty onboarding to existing agents

The Two-Track Agent System

Our solution: Separate tracks for prototype and production agents with clear promotion criteria.

Agent Development System
│
├── Prototype Track (agents/prototypes/)
│   ├── Fast iteration
│   ├── Minimal documentation
│   ├── Breaking changes OK
│   └── No tests required
│
└── Production Track (agents/production/)
    ├── Stable API
    ├── Comprehensive docs
    ├── Full test coverage
    └── Monitoring & logging

When to Use Each Track

Prototype Track:

Initial exploration and experimentation
Testing new LLM capabilities
Rapid tool integration trials
Proof-of-concept development
Research and learning

Production Track:

Agents used by others or in automation
Critical business workflows
Public-facing applications
Deployed services
Agents with users who expect reliability

Production Agent Template Structure

Every production agent gets this standardized structure:

agents/production/agent-name/
├── README.md                  # Complete documentation
├── agent.py                   # Main agent code
├── config.yaml               # Configuration
├── requirements.txt          # Dependencies
├── environment.yaml          # Full environment
├── prompts/                  # Version-controlled prompts
│   ├── system.txt           # System prompt
│   ├── user_template.txt    # User prompt template
│   └── few_shot.yaml        # Few-shot examples
├── tools/                    # Agent tools
│   ├── __init__.py
│   ├── tool_registry.py     # Tool management
│   └── [tool_name].py       # Individual tools
├── memory/                   # State and memory
│   ├── conversation.py      # Conversation history
│   ├── context.py          # Context management
│   └── state.py            # Agent state
├── tests/                   # Comprehensive tests
│   ├── test_agent.py       # Agent behavior tests
│   ├── test_tools.py       # Tool tests
│   └── test_integration.py # End-to-end tests
├── logs/                    # Execution logs
│   └── [timestamp].log
├── metrics/                 # Performance tracking
│   └── metrics.json
└── deployment/             # Deployment assets
    ├── Dockerfile
    ├── docker-compose.yml
    └── deploy.sh

The Complete Agent README

Location: agents/production/[agent-name]/README.md

# Agent: [Agent Name]

**Status:** 🟢 Production | 🟡 Beta | 🔴 Experimental
**Version:** v1.0.0
**Last Updated:** YYYY-MM-DD
**Maintainer:** Your Name

---

## Quick Summary

**Purpose:** [One sentence describing what this agent does]

**Use Cases:**
1. [Primary use case]
2. [Secondary use case]
3. [Additional use case]

**Key Capabilities:**
- [Capability 1]
- [Capability 2]
- [Capability 3]

---

## Quick Start

### Installation

```bash
# Clone or navigate to agent directory
cd agents/production/agent-name

# Create environment
conda env create -f environment.yaml
conda activate agent-env

# Install dependencies
pip install -r requirements.txt

# Verify installation
python agent.py --verify

Basic Usage

from agent import AgentName

# Initialize agent
agent = AgentName(
    model="llama3.1:8b",
    temperature=0.7,
    max_iterations=5
)

# Run agent
result = agent.run("Your task here")
print(result)

Configuration

Edit config.yaml:

model:
  name: "llama3.1:8b"
  temperature: 0.7
  max_tokens: 2000

agent:
  max_iterations: 5
  max_tool_calls: 10
  timeout_seconds: 300

Architecture

System Overview

User Input
    ↓
[Agent Core] ←→ [LLM (Ollama)]
    ↓
[Tool Orchestrator]
    ↓
[Tools] → [External APIs/Systems]
    ↓
[Memory/State]
    ↓
Agent Response

Components

Agent Core (agent.py):

Main agent logic and reasoning loop
Decision-making and planning
Error handling and recovery
Response generation

Tool System (tools/):

Tool registry and management
Individual tool implementations
Tool call validation
Result processing

Memory System (memory/):

Conversation history management
Context window management
State persistence
Session management

Prompt System (prompts/):

System prompt definition
User prompt templates
Few-shot examples
Prompt versioning

Detailed Usage

Basic Example

from agent import CustomerSupportAgent

# Initialize
agent = CustomerSupportAgent()

# Simple query
response = agent.run(
    query="How do I reset my password?",
    context={"user_id": "12345"}
)

print(response.text)
print(f"Tools used: {response.tools_called}")
print(f"Confidence: {response.confidence}")

Advanced Example

# Multi-turn conversation
agent = CustomerSupportAgent()

conversation_id = agent.start_conversation(
    user_id="12345",
    initial_context={"account_type": "premium"}
)

# Turn 1
response1 = agent.continue_conversation(
    conversation_id=conversation_id,
    message="I can't log in to my account"
)

# Turn 2
response2 = agent.continue_conversation(
    conversation_id=conversation_id,
    message="I tried that already, still not working"
)

# Get conversation history
history = agent.get_conversation_history(conversation_id)

Streaming Response

# Stream agent response
for chunk in agent.run_streaming("Complex analysis task"):
    print(chunk.text, end="")
    if chunk.tool_call:
        print(f"\n[Using tool: {chunk.tool_call.name}]")

Tools Available

Built-in Tools

knowledge_base_search
- Purpose: Search internal knowledge base
- Input: Query string
- Output: Relevant documents
- Example: {"query": "password reset policy"}
api_call
- Purpose: Call external API
- Input: Endpoint, method, params
- Output: API response
- Example: {"endpoint": "/user/status", "method": "GET"}
database_query
- Purpose: Query user database
- Input: Query parameters
- Output: User data
- Example: {"user_id": "12345", "fields": ["email", "status"]}

Adding Custom Tools

# tools/custom_tool.py
from tools.base import BaseTool

class CustomTool(BaseTool):
    name = "custom_tool"
    description = "What this tool does"

    def __init__(self):
        super().__init__()

    def execute(self, **kwargs):
        """Execute tool logic"""
        # Your implementation
        return {"result": "value"}

    def validate_input(self, **kwargs):
        """Validate inputs"""
        required = ["param1", "param2"]
        return all(k in kwargs for k in required)

from tools.custom_tool import CustomTool

TOOLS = {
    "custom_tool": CustomTool(),
    # ... other tools
}

Configuration Reference

config.yaml Structure

# Model configuration
model:
  name: "llama3.1:8b"          # Ollama model
  temperature: 0.7              # Sampling temperature
  max_tokens: 2000             # Max response length
  top_p: 0.9                   # Nucleus sampling

# Agent behavior
agent:
  max_iterations: 5            # Max reasoning loops
  max_tool_calls: 10          # Max tool invocations
  timeout_seconds: 300        # Overall timeout
  retry_on_error: true        # Retry failed operations
  max_retries: 3              # Max retry attempts

# Tools configuration
tools:
  enabled:
    - knowledge_base_search
    - api_call
    - database_query
  timeout_per_tool: 30        # Seconds per tool

# Memory configuration
memory:
  max_history_length: 10      # Messages to keep
  context_window: 4096        # Token limit
  summarize_old: true         # Summarize old messages

# Logging
logging:
  level: "INFO"
  log_tool_calls: true
  log_prompts: false          # Sensitive: disable in prod
  save_conversations: true

Testing

Running Tests

# All tests
pytest tests/

# Specific test file
pytest tests/test_agent.py

# With coverage
pytest --cov=. tests/

# Verbose output
pytest -v tests/

Test Structure

Unit Tests (test_tools.py):

def test_knowledge_base_search():
    """Test knowledge base search tool"""
    tool = KnowledgeBaseSearch()
    result = tool.execute(query="test query")
    assert result["documents"] is not None
    assert len(result["documents"]) > 0

Integration Tests (test_integration.py):

def test_agent_conversation_flow():
    """Test complete conversation flow"""
    agent = CustomerSupportAgent()

    # Start conversation
    conv_id = agent.start_conversation(user_id="test")

    # First message
    response1 = agent.continue_conversation(
        conversation_id=conv_id,
        message="I need help"
    )
    assert response1.text is not None

    # Follow-up
    response2 = agent.continue_conversation(
        conversation_id=conv_id,
        message="Tell me more"
    )
    assert len(response2.conversation_history) == 4  # 2 exchanges

Behavior Tests (test_agent.py):

def test_agent_handles_tool_failure():
    """Test graceful handling of tool failures"""
    agent = CustomerSupportAgent()

    # Mock tool to fail
    with patch('tools.api_call.execute') as mock_tool:
        mock_tool.side_effect = Exception("API error")

        response = agent.run("Query requiring API call")

        # Agent should recover gracefully
        assert response.text is not None
        assert "error" in response.text.lower()
        assert response.success is False

Monitoring & Metrics

Metrics Tracked

{
  "conversation_id": "abc123",
  "timestamp": "2024-10-19T14:30:00",
  "duration_seconds": 4.2,
  "model_used": "llama3.1:8b",
  "tokens": {
    "prompt": 1024,
    "completion": 256,
    "total": 1280
  },
  "tools_called": [
    {
      "name": "knowledge_base_search",
      "duration_ms": 120,
      "success": true
    }
  ],
  "iterations": 2,
  "success": true,
  "error": null,
  "user_feedback": null
}

Accessing Metrics

# Get agent metrics
from metrics import AgentMetrics

metrics = AgentMetrics.load("metrics/metrics.json")

# Summary statistics
print(f"Total conversations: {metrics.total_conversations}")
print(f"Average duration: {metrics.avg_duration}s")
print(f"Success rate: {metrics.success_rate}%")
print(f"Most used tools: {metrics.top_tools}")

Deployment

Docker Deployment

# Build image
docker build -t agent-name:v1.0.0 .

# Run container
docker run -d \
  --name agent-name \
  -p 8000:8000 \
  -v $(pwd)/logs:/app/logs \
  -e OLLAMA_HOST=http://host.docker.internal:11434 \
  agent-name:v1.0.0

# Check logs
docker logs -f agent-name

API Server

The agent includes a FastAPI server:

# Start server
python -m agent.server

# Server runs on http://localhost:8000

API Endpoints:

# Health check
curl http://localhost:8000/health

# Run agent
curl -X POST http://localhost:8000/run \
  -H "Content-Type: application/json" \
  -d '{"query": "Your question here"}'

# Start conversation
curl -X POST http://localhost:8000/conversations \
  -H "Content-Type: application/json" \
  -d '{"user_id": "12345"}'

Troubleshooting

Common Issues

Issue: Agent times out

Cause: Task too complex or tools too slow
Solution: Increase timeout_seconds in config.yaml or optimize tools

Issue: Tool calls fail

Cause: Tool errors or invalid inputs
Solution: Check logs/ for detailed error messages, verify tool inputs

Issue: Poor responses

Cause: Prompt issues or model limitations
Solution: Review prompts/system.txt, add few-shot examples

Issue: Memory errors

Cause: Context window exceeded
Solution: Enable summarize_old in memory config

Debug Mode

# Enable debug logging
agent = CustomerSupportAgent(debug=True)

# This logs:
# - Full prompts sent to LLM
# - Tool inputs/outputs
# - Reasoning steps
# - Error traces

Development Workflow

Prototype → Production Checklist

Versioning Strategy

Semantic Versioning:

v1.0.0 - Major: Breaking API changes
v1.1.0 - Minor: New features, backward compatible
v1.1.1 - Patch: Bug fixes

Prompt Versioning:

prompts/
├── v1/
│   ├── system.txt
│   └── user_template.txt
└── v2/
    ├── system.txt
    └── user_template.txt

Best Practices

Agent Design

Keep It Focused:

One agent, one clear purpose
Don't build "do everything" agents
Prefer specialized agents with clear domains

Fail Gracefully:

Validate inputs before tool calls
Handle tool failures without crashing
Provide helpful error messages
Always return a response (even if degraded)

Be Observable:

Log all tool calls
Track metrics consistently
Make debugging easy
Provide clear status indicators

Prompt Engineering

Version Control Prompts:

Keep prompts in separate files
Use git for prompt versioning
Document prompt changes
A/B test prompt variations

Structure System Prompts:

1. Role definition
2. Capabilities and limitations
3. Tool descriptions
4. Response format requirements
5. Behavioral guidelines

Tool Development

Make Tools Atomic:

One tool, one clear function
Avoid tool interdependencies
Return structured data
Include success/failure indicators

Validate Everything:

Check inputs before execution
Validate outputs before returning
Handle edge cases explicitly
Provide clear error messages

Performance Optimization

Response Time

Target: <5 seconds for typical queries

Strategies:

Cache tool results - Don't repeat identical queries
Parallel tool calls - When tools are independent
Optimize prompts - Shorter prompts = faster inference
Early stopping - Return when confident enough

Memory Management

Context Window Management:

# Summarize old messages when context is full
if context_length > max_context:
    summary = summarize_messages(old_messages)
    context = [summary] + recent_messages

Cost Optimization

Token Usage:

Monitor tokens per conversation
Set appropriate max_tokens
Use smaller models when possible
Cache frequent queries

Security Considerations

Input Validation:

Sanitize all user inputs
Validate tool parameters
Prevent injection attacks
Limit query length

Access Control:

Authenticate API requests
Authorize tool access per user
Audit sensitive operations
Rate limit requests

Data Privacy:

Don't log sensitive data
Encrypt stored conversations
Comply with privacy regulations
Provide data deletion

What's Next

You now have production-ready AI agent templates with standardized structure, comprehensive testing, and deployment readiness. Your agents are well-architected, observable, and maintainable.

In Part 5: Ollama Model Management and Workflow Integration, we'll complete the series by covering:

Ollama model lifecycle management
Custom Modelfile patterns
Model versioning and registry
Integration with agents and experiments
Complete workflow automation

We'll tie together all the pieces—workspace structure, documentation, experiments, and agents—into a unified Ollama-centered workflow.

Key Takeaways

Two-track system separates prototypes from production agents
Standardized templates ensure consistent agent structure
Comprehensive testing makes agents reliable and maintainable
Clear architecture with separated concerns (agent/tools/memory/prompts)
Production readiness includes deployment, monitoring, and security
Documentation makes agents usable by others

Resources

Templates:

Agent README template (provided above)
Agent structure (directory layout)
Test templates (unit/integration/behavior)

Code Examples:

Basic agent implementation
Tool development pattern
Memory management
API server

Series Navigation

Previous: Part 3: Experiment Tracking
Next: Part 5: Ollama Integration (coming soon)
Series Home: Building a Production ML Workspace on GPU Infrastructure

Questions or suggestions? Find me on Twitter @bioinfo or at rundatarun.io

About the Author: Justin Johnson builds AI systems and writes about practical AI development.

justinhjohnson.com | Twitter | LinkedIn | Run Data Run | Subscribe

Related experiments

Apparatus

1,555 words · 10 min read

ai-agents
agent-development
ml-development
best-practices
production-systems