# Building a Production ML Workspace: Part 5 - Team Collaboration and Workflow Integration
You've built the foundation: workspace structure, documentation systems, experiment tracking, and production-ready agents. Your individual workflows are solid. But **ML development is a team sport**.
When multiple researchers share GPU infrastructure, collaborate on experiments, and deploy agents to production, new challenges emerge:
- How do team members discover and reuse each other's work?
- How do you prevent conflicts when multiple people experiment simultaneously?
- How do you maintain consistency across different developer environments?
- How do you automate the workflow from experiment to production deployment?
- How do you handle model versioning and Ollama model lifecycle management?
This final article completes the series by showing you how to **integrate everything into a collaborative, automated workflow** that scales from solo research to full team production.
<div class="callout" data-callout="info">
<div class="callout-title">About This Series</div>
<div class="callout-content">
This is Part 5 (final) of a 5-part series on building production ML workspaces:
- [[building-production-ml-workspace-part-1-structure|Part 1: Workspace Structure]]
- [[building-production-ml-workspace-part-2-documentation|Part 2: Documentation Systems]]
- [[building-production-ml-workspace-part-3-experiments|Part 3: Experiment Tracking]]
- [[building-production-ml-workspace-part-4-agents|Part 4: Production-Ready AI Agent Templates]]
This article ties everything together with team collaboration and workflow automation.
</div>
</div>
---
## The Team Collaboration Problem
Individual productivity is different from team effectiveness. Here's what breaks down when teams scale:
**Discovery Problems:**
- "Did someone already try this approach?"
- "Where's the trained model Sarah mentioned?"
- "Which agent template should I start from?"
- "What experiments ran last week?"
**Conflict Problems:**
- Two researchers overwrite each other's experiments
- Model files conflict in shared directories
- GPU allocation conflicts during training
- Inconsistent Python environments cause "works on my machine"
**Quality Problems:**
- No peer review before production deployment
- Undocumented experiments no one can reproduce
- Agents deployed without proper testing
- Configuration drift across environments
**Workflow Problems:**
- Manual steps between experiment and deployment
- No clear promotion path from prototype to production
- Unclear ownership of models and agents
- No standardized release process
---
## The Integrated Workflow Solution
Our solution: **Automated workflows with clear promotion paths and team visibility**.
```
Integrated ML Workflow
│
├── Individual Development
│ ├── Local experiments with tracking
│ ├── Personal branches for prototypes
│ ├── Automated environment setup
│ └── Self-service model management
│
├── Team Collaboration
│ ├── Shared experiment registry
│ ├── Code review for production code
│ ├── Automated testing gates
│ └── Centralized model registry
│
├── Production Pipeline
│ ├── Automated deployment
│ ├── Model versioning
│ ├── Monitoring & alerting
│ └── Rollback capabilities
│
└── Governance
├── Resource allocation
├── Cost tracking
├── Compliance & security
└── Knowledge sharing
```
---
## Part 1: Version Control Strategy
### Repository Structure
Monorepo approach for shared workspace:
```bash
ml-workspace/
├── .git/
├── .github/
│ └── workflows/ # CI/CD automation
│ ├── experiment-validation.yml
│ ├── agent-tests.yml
│ └── deploy-production.yml
├── experiments/
│ ├── active/ # Current experiments
│ │ └── [researcher]/ # Personal namespace
│ └── archive/ # Completed experiments
├── agents/
│ ├── prototypes/ # WIP agents (no review needed)
│ └── production/ # Reviewed production agents
├── models/
│ ├── registry.yaml # Model catalog
│ └── checkpoints/ # Versioned model files
├── shared/ # Team utilities
│ ├── tools/ # Common tools
│ ├── prompts/ # Reusable prompts
│ └── configs/ # Standard configs
├── docs/
│ ├── runbooks/ # Operational guides
│ └── adrs/ # Architecture decisions
└── scripts/
├── setup-env.sh # Environment setup
├── sync-ollama.sh # Model sync
└── deploy-agent.sh # Deployment automation
```
### Branching Strategy
**Branch Types:**
```bash
main # Production-ready code
├── develop # Integration branch
├── experiment/* # Individual experiments
├── feature/* # New capabilities
└── hotfix/* # Production fixes
```
**Workflow:**
```bash
# Start new experiment
git checkout develop
git checkout -b experiment/username/model-comparison
# Work on experiment
# ... run experiments, document results ...
# Share experiment (no merge)
git push origin experiment/username/model-comparison
# Promote to production (requires review)
git checkout develop
git merge experiment/username/model-comparison
# ... PR review, tests pass ...
git checkout main
git merge develop
```
### What to Commit vs. What to Ignore
**.gitignore Configuration:**
```bash
# Commit these:
# - Experiment code
# - Configuration files
# - Documentation
# - Small reference datasets (<10MB)
# - Model registry metadata
# Ignore these (use .gitignore):
*.pyc
__pycache__/
.ipynb_checkpoints/
*.log
.env
# Large files
*.pth
*.safetensors
*.gguf
datasets/large/
models/checkpoints/*.bin
# Experiment artifacts (tracked separately)
experiments/*/outputs/
experiments/*/runs/
mlruns/
wandb/
# Personal configs
.vscode/
.idea/
*.swp
```
**Large File Strategy:**
```bash
# Use Git LFS for model files
git lfs track "*.pth"
git lfs track "*.safetensors"
# Or use external storage with manifests
models/
├── registry.yaml # Committed (metadata only)
└── checkpoints/
└── .gitignore # Ignore actual files
# Actual files stored in shared NAS or S3
```
---
## Part 2: Ollama Model Management
### Model Registry System
Centralized catalog of all models:
**models/registry.yaml:**
```yaml
models:
llama3.1-8b-base:
source: "ollama"
model_name: "llama3.1:8b"
version: "latest"
purpose: "General purpose chat and reasoning"
tags: ["base", "chat", "reasoning"]
owners: ["team"]
created: "2024-10-01"
updated: "2024-10-15"
medical-assistant-v2:
source: "custom"
base_model: "llama3.1:8b"
modelfile: "modelfiles/medical-assistant-v2.txt"
version: "v2.1.0"
purpose: "Medical query assistant with RAG"
tags: ["custom", "medical", "rag"]
owners: ["sarah"]
created: "2024-10-10"
updated: "2024-10-18"
performance:
accuracy: 0.89
latency_p50: "850ms"
latency_p95: "1.2s"
code-reviewer-v1:
source: "custom"
base_model: "codellama:13b"
modelfile: "modelfiles/code-reviewer-v1.txt"
version: "v1.0.0"
purpose: "Code review and security analysis"
tags: ["custom", "code", "security"]
owners: ["john"]
created: "2024-10-15"
status: "production"
```
### Modelfile Version Control
**modelfiles/medical-assistant-v2.txt:**
```dockerfile
FROM llama3.1:8b
# System prompt
SYSTEM """You are a medical research assistant with expertise in clinical
trials and biomedical literature. Provide accurate, evidence-based responses
with citations when possible. Always acknowledge uncertainty."""
# Parameters optimized for medical domain
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
# Custom stop sequences
PARAMETER stop "###"
PARAMETER stop "<END>"
# Template for structured responses
TEMPLATE """### Question: {{ .Prompt }}
### Response:
{{ .Response }}
### Confidence: [High/Medium/Low]
### Citations: [If applicable]
"""
```
### Model Sync Script
**scripts/sync-ollama.sh:**
```bash
#!/bin/bash
# Sync Ollama models across team based on registry
set -e
REGISTRY_FILE="models/registry.yaml"
MODELFILES_DIR="modelfiles"
echo "🔄 Syncing Ollama models from registry..."
# Parse registry and pull/build models
# (Requires yq for YAML parsing)
# Pull base models
echo "📥 Pulling base models..."
yq eval '.models[] | select(.source == "ollama") | .model_name' "$REGISTRY_FILE" | \
while read -r model; do
echo " Pulling $model..."
ollama pull "$model"
done
# Build custom models from Modelfiles
echo "🔨 Building custom models..."
yq eval '.models[] | select(.source == "custom") | [.model_name, .modelfile] | @tsv' "$REGISTRY_FILE" | \
while IFS=