Building a Production ML Workspace: Part 5 - Team Collaboration and Workflow Integration
Building a Production ML Workspace: Part 5 - Team Collaboration and Workflow Integration
You've built the foundation: workspace structure, documentation systems, experiment tracking, and production-ready agents. Your individual workflows are solid. But ML development is a team sport.
When multiple researchers share GPU infrastructure, collaborate on experiments, and deploy agents to production, new challenges emerge:
- How do team members discover and reuse each other's work?
- How do you prevent conflicts when multiple people experiment simultaneously?
- How do you maintain consistency across different developer environments?
- How do you automate the workflow from experiment to production deployment?
- How do you handle model versioning and Ollama model lifecycle management?
This final article completes the series by showing you how to integrate everything into a collaborative, automated workflow that scales from solo research to full team production.
This is Part 5 (final) of a 5-part series on building production ML workspaces:
- Part 1: Workspace StructureshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 1 - Designing an Organized StructureLearn how to design a scalable ML workspace structure that handles Ollama models, fine-tuning, agents, and experiments without becoming chaotic.
- Part 2: Documentation SystemsshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 2 - Documentation Systems That ScaleBuild a three-tier documentation system that captures ML work for debugging, review, and blog content—turning your experiments into shareable knowledge.
- Part 3: Experiment TrackingshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 3 - Experiment Tracking and ReproducibilityBuild systematic experiment tracking with templates, progress monitoring, and lifecycle management to ensure every ML experiment is reproducible and builds toward knowledge.
- Part 4: Production-Ready AI Agent TemplatesshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 4 - Production-Ready AI Agent TemplatesBuild production-ready AI agents with standardized templates, tool integration patterns, comprehensive testing, and deployment readiness frameworks.
This article ties everything together with team collaboration and workflow automation.
The Team Collaboration Problem
Individual productivity is different from team effectiveness. Here's what breaks down when teams scale:
Discovery Problems:
- "Did someone already try this approach?"
- "Where's the trained model Sarah mentioned?"
- "Which agent template should I start from?"
- "What experiments ran last week?"
Conflict Problems:
- Two researchers overwrite each other's experiments
- Model files conflict in shared directories
- GPU allocation conflicts during training
- Inconsistent Python environments cause "works on my machine"
Quality Problems:
- No peer review before production deployment
- Undocumented experiments no one can reproduce
- Agents deployed without proper testing
- Configuration drift across environments
Workflow Problems:
- Manual steps between experiment and deployment
- No clear promotion path from prototype to production
- Unclear ownership of models and agents
- No standardized release process
The Integrated Workflow Solution
Our solution: Automated workflows with clear promotion paths and team visibility.
Integrated ML Workflow
│
├── Individual Development
│ ├── Local experiments with tracking
│ ├── Personal branches for prototypes
│ ├── Automated environment setup
│ └── Self-service model management
│
├── Team Collaboration
│ ├── Shared experiment registry
│ ├── Code review for production code
│ ├── Automated testing gates
│ └── Centralized model registry
│
├── Production Pipeline
│ ├── Automated deployment
│ ├── Model versioning
│ ├── Monitoring & alerting
│ └── Rollback capabilities
│
└── Governance
├── Resource allocation
├── Cost tracking
├── Compliance & security
└── Knowledge sharing
Part 1: Version Control Strategy
Repository Structure
Monorepo approach for shared workspace:
ml-workspace/
├── .git/
├── .github/
│ └── workflows/ # CI/CD automation
│ ├── experiment-validation.yml
│ ├── agent-tests.yml
│ └── deploy-production.yml
├── experiments/
│ ├── active/ # Current experiments
│ │ └── [researcher]/ # Personal namespace
│ └── archive/ # Completed experiments
├── agents/
│ ├── prototypes/ # WIP agents (no review needed)
│ └── production/ # Reviewed production agents
├── models/
│ ├── registry.yaml # Model catalog
│ └── checkpoints/ # Versioned model files
├── shared/ # Team utilities
│ ├── tools/ # Common tools
│ ├── prompts/ # Reusable prompts
│ └── configs/ # Standard configs
├── docs/
│ ├── runbooks/ # Operational guides
│ └── adrs/ # Architecture decisions
└── scripts/
├── setup-env.sh # Environment setup
├── sync-ollama.sh # Model sync
└── deploy-agent.sh # Deployment automation
Branching Strategy
Branch Types:
main # Production-ready code
├── develop # Integration branch
├── experiment/* # Individual experiments
├── feature/* # New capabilities
└── hotfix/* # Production fixes
Workflow:
# Start new experiment
git checkout develop
git checkout -b experiment/username/model-comparison
# Work on experiment
# ... run experiments, document results ...
# Share experiment (no merge)
git push origin experiment/username/model-comparison
# Promote to production (requires review)
git checkout develop
git merge experiment/username/model-comparison
# ... PR review, tests pass ...
git checkout main
git merge develop
What to Commit vs. What to Ignore
.gitignore Configuration:
# Commit these:
# - Experiment code
# - Configuration files
# - Documentation
# - Small reference datasets (<10MB)
# - Model registry metadata
# Ignore these (use .gitignore):
*.pyc
__pycache__/
.ipynb_checkpoints/
*.log
.env
# Large files
*.pth
*.safetensors
*.gguf
datasets/large/
models/checkpoints/*.bin
# Experiment artifacts (tracked separately)
experiments/*/outputs/
experiments/*/runs/
mlruns/
wandb/
# Personal configs
.vscode/
.idea/
*.swp
Large File Strategy:
# Use Git LFS for model files
git lfs track "*.pth"
git lfs track "*.safetensors"
# Or use external storage with manifests
models/
├── registry.yaml # Committed (metadata only)
└── checkpoints/
└── .gitignore # Ignore actual files
# Actual files stored in shared NAS or S3
Part 2: Ollama Model Management
Model Registry System
Centralized catalog of all models:
models/registry.yaml:
models:
llama3.1-8b-base:
source: "ollama"
model_name: "llama3.1:8b"
version: "latest"
purpose: "General purpose chat and reasoning"
tags: ["base", "chat", "reasoning"]
owners: ["team"]
created: "2024-10-01"
updated: "2024-10-15"
medical-assistant-v2:
source: "custom"
base_model: "llama3.1:8b"
modelfile: "modelfiles/medical-assistant-v2.txt"
version: "v2.1.0"
purpose: "Medical query assistant with RAG"
tags: ["custom", "medical", "rag"]
owners: ["sarah"]
created: "2024-10-10"
updated: "2024-10-18"
performance:
accuracy: 0.89
latency_p50: "850ms"
latency_p95: "1.2s"
code-reviewer-v1:
source: "custom"
base_model: "codellama:13b"
modelfile: "modelfiles/code-reviewer-v1.txt"
version: "v1.0.0"
purpose: "Code review and security analysis"
tags: ["custom", "code", "security"]
owners: ["john"]
created: "2024-10-15"
status: "production"
Modelfile Version Control
modelfiles/medical-assistant-v2.txt:
FROM llama3.1:8b
# System prompt
SYSTEM """You are a medical research assistant with expertise in clinical
trials and biomedical literature. Provide accurate, evidence-based responses
with citations when possible. Always acknowledge uncertainty."""
# Parameters optimized for medical domain
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
# Custom stop sequences
PARAMETER stop "###"
PARAMETER stop "<END>"
# Template for structured responses
TEMPLATE """### Question: {{ .Prompt }}
### Response:
{{ .Response }}
### Confidence: [High/Medium/Low]
### Citations: [If applicable]
"""
Model Sync Script
scripts/sync-ollama.sh:
#!/bin/bash
# Sync Ollama models across team based on registry
set -e
REGISTRY_FILE="models/registry.yaml"
MODELFILES_DIR="modelfiles"
echo "🔄 Syncing Ollama models from registry..."
# Parse registry and pull/build models
# (Requires yq for YAML parsing)
# Pull base models
echo "📥 Pulling base models..."
yq eval '.models[] | select(.source == "ollama") | .model_name' "$REGISTRY_FILE" | \
while read -r model; do
echo " Pulling $model..."
ollama pull "$model"
done
# Build custom models from Modelfiles
echo "🔨 Building custom models..."
yq eval '.models[] | select(.source == "custom") | [.model_name, .modelfile] | @tsv' "$REGISTRY_FILE" | \
while IFS=$'\t' read -r name modelfile; do
if [ -f "$modelfile" ]; then
echo " Building $name from $modelfile..."
ollama create "$name" -f "$modelfile"
else
echo " ⚠️ Modelfile not found: $modelfile"
fi
done
echo "✅ Model sync complete!"
echo ""
echo "Available models:"
ollama list
Model Lifecycle Management
Model States:
Development → Testing → Staging → Production → Deprecated
↓ ↓ ↓ ↓ ↓
Experiment Validation Preview Live Archived
Promotion Script:
#!/bin/bash
# scripts/promote-model.sh
MODEL_NAME=$1
FROM_ENV=$2
TO_ENV=$3
echo "🚀 Promoting model: $MODEL_NAME"
echo " From: $FROM_ENV → To: $TO_ENV"
# Validation checks
case $TO_ENV in
testing)
echo "✓ Running unit tests..."
pytest tests/models/test_${MODEL_NAME}.py
;;
staging)
echo "✓ Running integration tests..."
pytest tests/integration/test_${MODEL_NAME}_integration.py
echo "✓ Performance benchmarks..."
python scripts/benchmark-model.py "$MODEL_NAME"
;;
production)
echo "✓ Final validation..."
python scripts/validate-production-ready.py "$MODEL_NAME"
echo "✓ Creating backup..."
# Backup current production model
;;
esac
# Update registry
echo "📝 Updating registry..."
python scripts/update-registry.py "$MODEL_NAME" --environment "$TO_ENV"
echo "✅ Promotion complete!"
Part 3: Experiment Collaboration
Shared Experiment Registry
Team dashboard for all experiments:
scripts/generate-experiment-dashboard.py:
#!/usr/bin/env python3
"""Generate team experiment dashboard"""
import yaml
import json
from pathlib import Path
from datetime import datetime, timedelta
import pandas as pd
def scan_experiments():
"""Scan all experiments and build registry"""
experiments = []
exp_dir = Path("experiments/active")
for researcher_dir in exp_dir.iterdir():
if not researcher_dir.is_dir():
continue
researcher = researcher_dir.name
for exp_dir in researcher_dir.iterdir():
metadata_file = exp_dir / "metadata.yaml"
if not metadata_file.exists():
continue
with open(metadata_file) as f:
meta = yaml.safe_load(f)
experiments.append({
"researcher": researcher,
"experiment": exp_dir.name,
"goal": meta.get("goal", "N/A"),
"status": meta.get("status", "unknown"),
"created": meta.get("created"),
"updated": meta.get("updated"),
"tags": meta.get("tags", []),
"best_metric": meta.get("results", {}).get("best_metric")
})
return experiments
def generate_dashboard(experiments):
"""Generate HTML dashboard"""
df = pd.DataFrame(experiments)
html = f"""
<html>
<head><title>Team Experiments Dashboard</title></head>
<body>
<h1>ML Team Experiments</h1>
<p>Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M')}</p>
<h2>Active Experiments ({len(df)})</h2>
{df.to_html(index=False)}
<h2>Recent Activity</h2>
{df.sort_values('updated', ascending=False).head(10).to_html(index=False)}
</body>
</html>
"""
with open("docs/experiment-dashboard.html", "w") as f:
f.write(html)
print("✅ Dashboard generated: docs/experiment-dashboard.html")
if __name__ == "__main__":
experiments = scan_experiments()
generate_dashboard(experiments)
Experiment Handoff Process
When handing off experiments between team members:
1. Document thoroughly:
# experiments/active/sarah/medical-rag/HANDOFF.md
## Experiment Handoff
**From:** Sarah Johnson
**To:** Mike Chen
**Date:** 2024-10-19
### Current Status
- Completed initial RAG pipeline with llama3.1:8b
- Best accuracy: 89% on validation set
- Main bottleneck: Retrieval latency (avg 850ms)
### What Works
- Document chunking strategy (500 tokens, 50 overlap)
- Embedding model: all-MiniLM-L6-v2
- Reranking with cross-encoder significantly improves results
### What Doesn't Work
- ChromaDB too slow for >100K documents
- Need better medical entity recognition
- Current prompt struggles with complex multi-hop questions
### Next Steps
1. Try Qdrant or Milvus for vector store
2. Fine-tune NER model on medical corpus
3. Implement query decomposition for complex questions
### Files to Review
- `src/rag_pipeline.py` - Main pipeline
- `experiments/results/analysis.ipynb` - Performance analysis
- `docs/architecture.md` - System design
### How to Run
```bash
conda activate medical-rag
python src/train.py --config configs/baseline.yaml
python src/evaluate.py --checkpoint outputs/best_model.pth
Questions?
Slack: @sarah or email: sarah@company.com
**2. Pair programming session:**
- 30-60 minute walkthrough
- Run the experiment together
- Answer questions in real-time
**3. Update documentation:**
- Ensure README is current
- Add inline comments for complex logic
- Update architecture diagrams
---
## Part 4: Environment Consistency
### Reproducible Environments
**environment.yaml (conda):**
```yaml
name: ml-workspace
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- pytorch=2.1.0
- pytorch-cuda=12.1
- transformers=4.35.0
- numpy=1.24.0
- pandas=2.1.0
- scikit-learn=1.3.0
- jupyter=1.0.0
- pip
- pip:
- ollama==0.1.7
- chromadb==0.4.15
- langchain==0.0.335
- wandb==0.15.12
Setup Script:
#!/bin/bash
# scripts/setup-env.sh
echo "🔧 Setting up ML workspace environment..."
# Check prerequisites
command -v conda >/dev/null 2>&1 || {
echo "❌ Conda not found. Please install Miniconda/Anaconda first."
exit 1
}
command -v ollama >/dev/null 2>&1 || {
echo "❌ Ollama not found. Please install from ollama.ai"
exit 1
}
# Create conda environment
echo "📦 Creating conda environment..."
conda env create -f environment.yaml
# Activate environment
echo "🔄 Activating environment..."
eval "$(conda shell.bash hook)"
conda activate ml-workspace
# Sync Ollama models
echo "🤖 Syncing Ollama models..."
bash scripts/sync-ollama.sh
# Initialize experiment tracking
echo "📊 Setting up experiment tracking..."
python scripts/init-mlflow.py
# Verify installation
echo "✅ Verifying installation..."
python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "import transformers; print(f'Transformers: {transformers.__version__}')"
ollama list
echo ""
echo "✅ Setup complete!"
echo " Activate with: conda activate ml-workspace"
Docker for Production Agents
agents/production/[agent]/Dockerfile:
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Ollama client
RUN pip install ollama
# Copy application
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8000/health')"
# Run agent server
CMD ["python", "-m", "agent.server"]
docker-compose.yml:
version: '3.8'
services:
agent:
build: .
container_name: medical-assistant
ports:
- "8000:8000"
environment:
- OLLAMA_HOST=http://ollama:11434
- LOG_LEVEL=INFO
volumes:
- ./logs:/app/logs
- ./config.yaml:/app/config.yaml
depends_on:
- ollama
restart: unless-stopped
ollama:
image: ollama/ollama:latest
container_name: ollama-server
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
ollama-data:
Part 5: CI/CD Automation
GitHub Actions for Experiments
.github/workflows/experiment-validation.yml:
name: Validate Experiment
on:
push:
paths:
- 'experiments/**'
pull_request:
paths:
- 'experiments/**'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Check metadata exists
run: |
python scripts/validate-experiment-metadata.py
- name: Verify documentation
run: |
python scripts/check-experiment-docs.py
- name: Run linting
run: |
pip install ruff
ruff check experiments/
- name: Test experiment code
run: |
pip install pytest
pytest experiments/*/tests/ -v
Agent Testing Pipeline
.github/workflows/agent-tests.yml:
name: Agent Tests
on:
pull_request:
paths:
- 'agents/production/**'
jobs:
test:
runs-on: ubuntu-latest
services:
ollama:
image: ollama/ollama:latest
ports:
- 11434:11434
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r agents/production/${{ github.event.pull_request.head.ref }}/requirements.txt
pip install pytest pytest-cov
- name: Pull required models
run: |
ollama pull llama3.1:8b
- name: Run unit tests
run: |
pytest agents/production/*/tests/ \
--cov=agents/production \
--cov-report=html \
--cov-fail-under=80
- name: Run integration tests
run: |
pytest agents/production/*/tests/test_integration.py -v
- name: Security scan
run: |
pip install bandit
bandit -r agents/production/
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage.xml
Production Deployment Pipeline
.github/workflows/deploy-production.yml:
name: Deploy to Production
on:
push:
branches:
- main
paths:
- 'agents/production/**'
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v3
- name: Build Docker image
run: |
cd agents/production/$AGENT_NAME
docker build -t $REGISTRY/$AGENT_NAME:$VERSION .
- name: Run smoke tests
run: |
docker run --rm $REGISTRY/$AGENT_NAME:$VERSION python -m pytest tests/smoke/
- name: Push to registry
run: |
echo ${{ secrets.REGISTRY_TOKEN }} | docker login -u ${{ secrets.REGISTRY_USER }} --password-stdin
docker push $REGISTRY/$AGENT_NAME:$VERSION
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/$AGENT_NAME \
$AGENT_NAME=$REGISTRY/$AGENT_NAME:$VERSION
- name: Verify deployment
run: |
kubectl rollout status deployment/$AGENT_NAME
kubectl get pods -l app=$AGENT_NAME
- name: Smoke test production
run: |
python scripts/smoke-test-prod.py $AGENT_NAME
Part 6: Team Workflows
Daily Standup Dashboard
scripts/generate-standup.py:
#!/usr/bin/env python3
"""Generate daily standup report"""
from datetime import datetime, timedelta
import subprocess
import yaml
from pathlib import Path
def get_recent_commits():
"""Get commits from last 24 hours"""
yesterday = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
result = subprocess.run(
['git', 'log', f'--since={yesterday}', '--pretty=format:%an|%s'],
capture_output=True, text=True
)
commits = [line.split('|') for line in result.stdout.strip().split('\n') if line]
return commits
def get_active_experiments():
"""Get experiments updated in last 24 hours"""
experiments = []
exp_dir = Path("experiments/active")
for researcher_dir in exp_dir.iterdir():
for exp in researcher_dir.iterdir():
metadata = exp / "metadata.yaml"
if not metadata.exists():
continue
mtime = datetime.fromtimestamp(metadata.stat().st_mtime)
if mtime > datetime.now() - timedelta(days=1):
with open(metadata) as f:
meta = yaml.safe_load(f)
experiments.append({
'researcher': researcher_dir.name,
'experiment': exp.name,
'status': meta.get('status'),
'goal': meta.get('goal')
})
return experiments
def generate_report():
"""Generate standup report"""
print("📊 Daily Standup Report")
print(f"📅 {datetime.now().strftime('%Y-%m-%d')}")
print("=" * 50)
print("\n🚀 Recent Commits:")
commits = get_recent_commits()
for author, message in commits[:10]:
print(f" • {author}: {message}")
print("\n🧪 Active Experiments:")
experiments = get_active_experiments()
for exp in experiments:
print(f" • {exp['researcher']}: {exp['experiment']}")
print(f" Goal: {exp['goal']}")
print(f" Status: {exp['status']}")
print("\n📈 MLflow Experiments:")
# Query MLflow for recent runs
print(" Run: python scripts/query-mlflow.py --since yesterday")
if __name__ == "__main__":
generate_report()
Weekly Team Review
scripts/weekly-review.py:
#!/usr/bin/env python3
"""Generate weekly team review"""
import pandas as pd
from datetime import datetime, timedelta
from pathlib import Path
import yaml
import json
def analyze_experiments():
"""Analyze experiment activity"""
week_ago = datetime.now() - timedelta(days=7)
stats = {
'total_experiments': 0,
'completed': 0,
'in_progress': 0,
'by_researcher': {}
}
exp_dir = Path("experiments/active")
for researcher_dir in exp_dir.iterdir():
researcher = researcher_dir.name
stats['by_researcher'][researcher] = 0
for exp in researcher_dir.iterdir():
metadata = exp / "metadata.yaml"
if not metadata.exists():
continue
with open(metadata) as f:
meta = yaml.safe_load(f)
created = datetime.fromisoformat(meta.get('created', '2000-01-01'))
if created > week_ago:
stats['total_experiments'] += 1
stats['by_researcher'][researcher] += 1
status = meta.get('status')
if status == 'completed':
stats['completed'] += 1
elif status == 'in_progress':
stats['in_progress'] += 1
return stats
def analyze_models():
"""Analyze model registry"""
with open("models/registry.yaml") as f:
registry = yaml.safe_load(f)
return {
'total_models': len(registry.get('models', {})),
'production_models': sum(
1 for m in registry.get('models', {}).values()
if m.get('status') == 'production'
),
'custom_models': sum(
1 for m in registry.get('models', {}).values()
if m.get('source') == 'custom'
)
}
def generate_report():
"""Generate comprehensive weekly report"""
print("📊 Weekly Team Review")
print(f"📅 Week ending {datetime.now().strftime('%Y-%m-%d')}")
print("=" * 70)
exp_stats = analyze_experiments()
print("\n🧪 Experiment Activity:")
print(f" Total new experiments: {exp_stats['total_experiments']}")
print(f" Completed: {exp_stats['completed']}")
print(f" In progress: {exp_stats['in_progress']}")
print("\n By researcher:")
for researcher, count in exp_stats['by_researcher'].items():
print(f" • {researcher}: {count} experiments")
model_stats = analyze_models()
print("\n🤖 Model Registry:")
print(f" Total models: {model_stats['total_models']}")
print(f" Production models: {model_stats['production_models']}")
print(f" Custom models: {model_stats['custom_models']}")
print("\n📈 Key Metrics:")
print(" GPU Utilization: [Query from monitoring]")
print(" Agent Uptime: [Query from deployment]")
print(" Experiment Success Rate: [Calculate from results]")
print("\n💡 Recommendations:")
print(" • [Auto-generated based on patterns]")
print(" • Review abandoned experiments")
print(" • Share successful experiment patterns")
if __name__ == "__main__":
generate_report()
Code Review Guidelines
docs/runbooks/code-review-checklist.md:
# Code Review Checklist
## For Experiments
### Required
- [ ] Metadata file exists and is complete
- [ ] README documents experiment goal and methodology
- [ ] Configuration is externalized (no hardcoded values)
- [ ] Dependencies listed in requirements.txt
- [ ] Results directory structure follows template
- [ ] Code runs without errors
### Recommended
- [ ] Inline comments explain complex logic
- [ ] Visualization notebooks for results
- [ ] Performance metrics documented
- [ ] Comparison with baseline
## For Production Agents
### Critical (Must Pass)
- [ ] All tests pass (unit, integration, behavior)
- [ ] Test coverage >80%
- [ ] No security vulnerabilities (bandit scan passes)
- [ ] Complete README with usage examples
- [ ] Configuration externalized
- [ ] Logging comprehensive
- [ ] Error handling robust
- [ ] Dockerfile builds successfully
### Important (Should Pass)
- [ ] Code follows team style guide
- [ ] Docstrings for all public functions
- [ ] Type hints for function signatures
- [ ] Performance benchmarks run
- [ ] Metrics collection implemented
- [ ] Health check endpoint works
### Nice to Have
- [ ] Architecture diagram included
- [ ] Troubleshooting guide in README
- [ ] Example use cases demonstrated
- [ ] Monitoring dashboards defined
## Review Process
1. **Self Review**: Author completes checklist before PR
2. **Peer Review**: Team member reviews code
3. **Testing**: CI/CD pipeline validates automatically
4. **Approval**: 1 approval required for experiments, 2 for production agents
5. **Merge**: Squash and merge with descriptive message
Part 7: Resource Management
GPU Allocation
scripts/check-gpu-usage.sh:
#!/bin/bash
# Check GPU usage and availability
echo "🎮 GPU Resource Status"
echo "====================="
# Check NVIDIA GPUs
nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total \
--format=csv,noheader,nounits | \
while IFS=',' read -r index name util mem_used mem_total; do
echo "GPU $index: $name"
echo " Utilization: ${util}%"
echo " Memory: ${mem_used}MB / ${mem_total}MB"
# Check if GPU is idle (< 20% utilization)
if [ "$util" -lt 20 ]; then
echo " Status: ✅ Available"
else
echo " Status: 🔴 Busy"
# Show which process is using it
nvidia-smi --query-compute-apps=pid,process_name,used_memory \
--format=csv,noheader | grep -v "^$" | \
while IFS=',' read -r pid process mem; do
user=$(ps -o user= -p "$pid" 2>/dev/null)
echo " Process: $process (PID: $pid, User: $user)"
done
fi
echo ""
done
Resource Reservation System
scripts/reserve-gpu.py:
#!/usr/bin/env python3
"""GPU reservation system"""
import json
import sys
from datetime import datetime, timedelta
from pathlib import Path
RESERVATIONS_FILE = "shared/gpu-reservations.json"
def load_reservations():
"""Load current reservations"""
if Path(RESERVATIONS_FILE).exists():
with open(RESERVATIONS_FILE) as f:
return json.load(f)
return {"reservations": []}
def save_reservations(data):
"""Save reservations"""
with open(RESERVATIONS_FILE, 'w') as f:
json.dump(data, f, indent=2)
def reserve_gpu(gpu_id, user, duration_hours, purpose):
"""Reserve a GPU"""
data = load_reservations()
# Check if GPU is already reserved
now = datetime.now()
for res in data['reservations']:
if res['gpu_id'] == gpu_id:
end_time = datetime.fromisoformat(res['end_time'])
if end_time > now:
print(f"❌ GPU {gpu_id} is reserved by {res['user']} until {end_time}")
return False
# Create reservation
reservation = {
'gpu_id': gpu_id,
'user': user,
'purpose': purpose,
'start_time': now.isoformat(),
'end_time': (now + timedelta(hours=duration_hours)).isoformat()
}
data['reservations'].append(reservation)
save_reservations(data)
print(f"✅ GPU {gpu_id} reserved for {user}")
print(f" Duration: {duration_hours} hours")
print(f" Until: {reservation['end_time']}")
return True
def list_reservations():
"""List all current reservations"""
data = load_reservations()
now = datetime.now()
print("📅 Current GPU Reservations")
print("=" * 50)
active_reservations = [
res for res in data['reservations']
if datetime.fromisoformat(res['end_time']) > now
]
if not active_reservations:
print("No active reservations")
return
for res in active_reservations:
end = datetime.fromisoformat(res['end_time'])
remaining = end - now
print(f"GPU {res['gpu_id']}: {res['user']}")
print(f" Purpose: {res['purpose']}")
print(f" Time remaining: {remaining}")
print("")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage:")
print(" reserve-gpu.py list")
print(" reserve-gpu.py reserve <gpu_id> <user> <hours> <purpose>")
sys.exit(1)
command = sys.argv[1]
if command == "list":
list_reservations()
elif command == "reserve":
if len(sys.argv) != 6:
print("Usage: reserve-gpu.py reserve <gpu_id> <user> <hours> <purpose>")
sys.exit(1)
gpu_id = int(sys.argv[2])
user = sys.argv[3]
hours = int(sys.argv[4])
purpose = sys.argv[5]
reserve_gpu(gpu_id, user, hours, purpose)
Cost Tracking
scripts/track-compute-costs.py:
#!/usr/bin/env python3
"""Track compute costs"""
import json
from datetime import datetime, timedelta
from pathlib import Path
# Cost per GPU hour (example pricing)
GPU_COST_PER_HOUR = {
'A100': 3.00,
'A6000': 2.00,
'RTX4090': 1.50
}
def calculate_experiment_cost(experiment_dir):
"""Calculate cost for an experiment"""
metadata_file = Path(experiment_dir) / "metadata.yaml"
if not metadata_file.exists():
return 0.0
# Parse training time from logs or metadata
# This is simplified - you'd parse actual logs
training_hours = 2.5 # Example
gpu_type = "A100" # From metadata
cost = training_hours * GPU_COST_PER_HOUR.get(gpu_type, 1.0)
return cost
def generate_cost_report():
"""Generate monthly cost report"""
print("💰 Compute Cost Report")
print("=" * 50)
total_cost = 0.0
costs_by_user = {}
exp_dir = Path("experiments/active")
for researcher_dir in exp_dir.iterdir():
researcher = researcher_dir.name
costs_by_user[researcher] = 0.0
for exp in researcher_dir.iterdir():
cost = calculate_experiment_cost(exp)
costs_by_user[researcher] += cost
total_cost += cost
print(f"Total cost: ${total_cost:.2f}")
print("\nBy researcher:")
for user, cost in sorted(costs_by_user.items(), key=lambda x: x[1], reverse=True):
print(f" {user}: ${cost:.2f}")
if __name__ == "__main__":
generate_cost_report()
Part 8: Complete Workflow Example
Let's see how everything fits together with a real workflow from experiment to production.
Scenario: Building a Medical Q&A Agent
Day 1-2: Initial Experiment
# Sarah starts new experiment
cd ml-workspace
git checkout develop
git checkout -b experiment/sarah/medical-qa
# Setup experiment
mkdir -p experiments/active/sarah/medical-qa
cd experiments/active/sarah/medical-qa
# Initialize experiment
cat > metadata.yaml <<EOF
experiment_id: "medical-qa-v1"
researcher: "sarah"
created: "2024-10-19"
updated: "2024-10-19"
status: "in_progress"
goal: "Build RAG-based medical Q&A system"
hypothesis: "llama3.1:8b with medical corpus RAG can answer clinical questions"
tags: ["rag", "medical", "qa"]
mlflow_experiment: "medical-qa"
EOF
# Create README
cat > README.md <<EOF
# Medical Q&A Experiment
## Goal
Build RAG system for medical question answering.
## Approach
1. Index medical corpus (PubMed abstracts)
2. Implement retrieval with ChromaDB
3. Test llama3.1:8b with retrieved context
4. Evaluate on MedQA benchmark
## Running
\`\`\`bash
python train.py --config config.yaml
python evaluate.py --checkpoint outputs/best_model.pth
\`\`\`
EOF
# Write experiment code
# ... implement RAG pipeline ...
# Track with MLflow
python train.py
# Commit and share
git add .
git commit -m "Initial medical QA RAG experiment"
git push origin experiment/sarah/medical-qa
Day 3-4: Iteration and Results
# Run multiple experiments
mlflow ui # View results
# Best config identified
# - Chunk size: 500 tokens
# - Retrieval: top-5 with reranking
# - Model: llama3.1:8b, temp=0.3
# Document results
cat > results/summary.md <<EOF
# Results Summary
## Best Configuration
- Accuracy: 89%
- Latency p50: 850ms
- Retrieval precision: 0.85
## Conclusion
Ready for prototype agent deployment
EOF
# Update metadata
# status: completed
Day 5: Promote to Prototype Agent
# Create prototype agent
cd agents/prototypes/
mkdir medical-qa
cd medical-qa
# Copy experiment code
cp -r ../../../experiments/active/sarah/medical-qa/src/* ./
# Create agent wrapper
cat > agent.py <<EOF
"""Medical QA Agent - Prototype"""
from rag_pipeline import MedicalRAG
class MedicalQAAgent:
def __init__(self):
self.rag = MedicalRAG(model="llama3.1:8b")
def answer_question(self, question):
context = self.rag.retrieve(question)
answer = self.rag.generate(question, context)
return answer
EOF
# Test prototype
python test_agent.py
# Share with team
git add .
git commit -m "Add medical QA prototype agent"
git push origin experiment/sarah/medical-qa
Week 2: Production Promotion (Team Review)
# Mike volunteers to productionize
git checkout experiment/sarah/medical-qa
cd agents/production/
# Use production template
cp -r templates/agent-template medical-qa-agent
cd medical-qa-agent
# Fill out complete structure
# - Add comprehensive tests
# - Create Dockerfile
# - Write complete README
# - Add configuration
# - Implement monitoring
# Create PR
git checkout -b feature/medical-qa-production
git add .
git commit -m "Production-ready medical QA agent
- Complete test suite (85% coverage)
- Dockerized with health checks
- Comprehensive documentation
- Monitoring and metrics
- Performance benchmarks"
git push origin feature/medical-qa-production
# Open PR on GitHub
# - CI/CD runs automatically
# - Tests pass
# - Sarah reviews and approves
# - Merge to develop
Week 3: Production Deployment
# Merge to main triggers deployment
git checkout main
git merge develop
# GitHub Actions:
# 1. Build Docker image
# 2. Run tests
# 3. Push to registry
# 4. Deploy to Kubernetes
# 5. Run smoke tests
# Agent now live at https://api.company.com/medical-qa
# Update model registry
cat >> models/registry.yaml <<EOF
medical-qa-v1:
source: "custom"
base_model: "llama3.1:8b"
version: "v1.0.0"
purpose: "Medical question answering with RAG"
status: "production"
owners: ["sarah", "mike"]
performance:
accuracy: 0.89
latency_p50: "850ms"
EOF
Timeline Summary
- Days 1-4: Individual experiment (fast iteration)
- Day 5: Prototype agent (share with team)
- Week 2: Production hardening (quality focus)
- Week 3: Deployment (automated pipeline)
Total: 3 weeks from idea to production
Best Practices
For Individual Researchers
1. Document as you go
- Write README first (defines what you're building)
- Update metadata.yaml daily
- Commit frequently with good messages
- Screenshot interesting results
2. Follow templates
- Use experiment template
- Use agent template for production
- Don't skip required fields
- Consistency helps others help you
3. Share early
- Push experiments even if incomplete
- Ask for code review on tricky parts
- Present results in team meetings
- Write handoff docs when switching projects
For Team Leads
1. Automate everything
- Use CI/CD for validation
- Generate dashboards automatically
- Automate environment setup
- Script common workflows
2. Maintain standards
- Enforce code review for production
- Require tests for production agents
- Keep templates updated
- Document architecture decisions
3. Foster collaboration
- Weekly demo sessions
- Pair programming for complex features
- Shared Slack channel for questions
- Regular retrospectives
For Production Deployment
1. Progressive rollout
- Deploy to staging first
- Canary deployment (10% → 50% → 100%)
- Monitor metrics closely
- Have rollback plan ready
2. Observability
- Comprehensive logging
- Metrics dashboards
- Alerting on errors
- Performance tracking
3. Documentation
- Runbooks for operations
- Troubleshooting guides
- Architecture diagrams
- API documentation
Common Pitfalls to Avoid
Don't
❌ Skip documentation - Future you (and your team) will regret it
❌ Commit large binary files - Use Git LFS or external storage
❌ Work directly on main - Always use feature branches
❌ Deploy without tests - Tests prevent production disasters
❌ Hardcode secrets - Use environment variables or secret management
❌ Ignore failed CI - Fix immediately, don't accumulate debt
❌ Deploy without monitoring - You need visibility in production
❌ Skip code review - Fresh eyes catch bugs and improve design
Do
✅ Use templates consistently - Reduces cognitive load
✅ Automate repetitive tasks - Scripts save time and reduce errors
✅ Version everything - Code, models, prompts, configs
✅ Communicate proactively - Share blockers and successes
✅ Test thoroughly - Unit, integration, and end-to-end
✅ Monitor production - Metrics, logs, alerts
✅ Document decisions - ADRs explain the "why"
✅ Iterate quickly - Prototype → Test → Production
What You've Built
By completing this series, you now have:
Foundation (Part 1):
- Well-organized workspace structure
- Separation of experiments, agents, and documentation
- Clear file naming conventions
- Shared utilities and templates
Knowledge System (Part 2):
- Memory bank for context persistence
- Learning logs for insights
- Project documentation
- Architecture decision records
Experiment Tracking (Part 3):
- MLflow integration
- Reproducible experiments
- Version-controlled configurations
- Result visualization
Production Agents (Part 4):
- Two-track system (prototype/production)
- Standardized agent structure
- Comprehensive testing
- Deployment readiness
Team Collaboration (Part 5):
- Version control strategy
- Model registry and lifecycle
- CI/CD automation
- Resource management
- Complete workflows
Series Conclusion
You've built a production-ready ML workspace that scales from individual research to full team collaboration. This system:
Accelerates Development:
- Templates reduce setup time
- Automation removes manual steps
- Clear structure reduces decisions
- Reusable components speed development
Improves Quality:
- Tests catch bugs early
- Code review improves design
- Standards ensure consistency
- Monitoring catches production issues
Enables Collaboration:
- Shared structure everyone understands
- Version control prevents conflicts
- Documentation enables handoffs
- Registry provides discoverability
Scales with Your Team:
- Works for solo researchers
- Supports small teams
- Scales to larger organizations
- Adapts to changing needs
Key Takeaways
- Structure enables speed - Good organization removes friction
- Automate everything - Scripts and CI/CD save countless hours
- Documentation is infrastructure - Undocumented work is wasted work
- Version all artifacts - Code, models, prompts, configs
- Test before production - Tests prevent disasters
- Monitor what matters - You can't improve what you don't measure
- Collaborate deliberately - Clear processes enable teamwork
- Iterate continuously - Prototype → Test → Improve → Repeat
Next Steps
Immediate:
- Set up your workspace structure (Part 1)
- Initialize version control
- Create first experiment with template
- Set up MLflow tracking
Short-term (1-2 weeks):
- Build memory bank system
- Create agent templates
- Implement basic CI/CD
- Set up model registry
Long-term (1-3 months):
- Establish team workflows
- Build automation scripts
- Deploy first production agent
- Iterate based on team feedback
Ongoing:
- Refine templates based on learnings
- Update documentation regularly
- Share successful patterns
- Continuously improve automation
Resources
Code Examples:
- All scripts from this series (GitHub link)
- Template repository
- Example experiments
- Sample agents
Further Reading:
- Part 4: AI Agents and AutomationshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 4 - Production-Ready AI Agent TemplatesBuild production-ready AI agents with standardized templates, tool integration patterns, comprehensive testing, and deployment readiness frameworks.
- Part 3: Experiment TrackingshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 3 - Experiment Tracking and ReproducibilityBuild systematic experiment tracking with templates, progress monitoring, and lifecycle management to ensure every ML experiment is reproducible and builds toward knowledge.
Tools:
- MLflow for experiment tracking
- Ollama for local LLMs
- Docker for containerization
- GitHub Actions for CI/CD
Series Navigation
- Part 1: Workspace StructureshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 1 - Designing an Organized StructureLearn how to design a scalable ML workspace structure that handles Ollama models, fine-tuning, agents, and experiments without becoming chaotic.
- Part 2: Documentation SystemsshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 2 - Documentation Systems That ScaleBuild a three-tier documentation system that captures ML work for debugging, review, and blog content—turning your experiments into shareable knowledge.
- Part 3: Experiment TrackingshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 3 - Experiment Tracking and ReproducibilityBuild systematic experiment tracking with templates, progress monitoring, and lifecycle management to ensure every ML experiment is reproducible and builds toward knowledge.
- Part 4: Production-Ready AI AgentsshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 4 - Production-Ready AI Agent TemplatesBuild production-ready AI agents with standardized templates, tool integration patterns, comprehensive testing, and deployment readiness frameworks.
- Part 5: Team Collaboration and Workflow Integration (this article)
Questions or want to share your workspace setup? Find me on Twitter @bioinfo or at rundatarun.io
Thank you for following this series! I hope it helps you build better ML workflows and more effective teams.
Related Articles
- Building a Production ML Workspace: Part 4 - Production-Ready AI Agent TemplatesshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 4 - Production-Ready AI Agent TemplatesBuild production-ready AI agents with standardized templates, tool integration patterns, comprehensive testing, and deployment readiness frameworks.
- Building a Production ML Workspace: Part 2 - Documentation Systems That ScaleshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 2 - Documentation Systems That ScaleBuild a three-tier documentation system that captures ML work for debugging, review, and blog content—turning your experiments into shareable knowledge.
- Building a Production ML Workspace: Part 3 - Experiment Tracking and ReproducibilityshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 3 - Experiment Tracking and ReproducibilityBuild systematic experiment tracking with templates, progress monitoring, and lifecycle management to ensure every ML experiment is reproducible and builds toward knowledge.
About the Author: Justin Johnson builds AI systems and writes about practical AI development.
justinhjohnson.com | Twitter | LinkedIn | Run Data Run | Subscribe
Follow the lab
Get the next experiment
Enjoyed the breakdown on Building a Production ML Workspace: Part 5 - Team Collaboration and Workflow Integration? New entries land roughly weekly. No digest, no roundup. Just the next build log, when it ships.
Related experiments
Apparatus
1,618 words · 14 min read
- ml-development
- team-collaboration
- workflows
- best-practices
- production-systems
Links to this entry
- Building a Production ML Workspace: Part 1 - Designing an Organized Structure
- Building a Production ML Workspace: Part 2 - Documentation Systems That Scale
- Building a Production ML Workspace: Part 3 - Experiment Tracking and Reproducibility
- Building a Production ML Workspace: Part 4 - Production-Ready AI Agent Templates