Building a Production ML Workspace: Part 1 - Designing an Organized Structure
Building a Production ML Workspace: Part 1 - Designing an Organized Structure
When you start working with GPU infrastructure for machine learning—whether it's a DGX workstation, cloud GPUs, or local hardware—one of the first challenges you face isn't technical complexity. It's organizational chaos.
You'll be juggling Ollama models, fine-tuning experiments, agent prototypes, datasets, training checkpoints, and Jupyter notebooks. Without a clear structure from day one, you'll waste precious time searching for files, recreating experiments, and wondering "which model was that again?"
This article shows you how to design a workspace structure that scales from your first experiment to hundreds of concurrent projects, with a specific focus on Ollama integration and LLM development workflows.
This is Part 1 of a 5-part series on building production ML workspaces on GPU infrastructure. Each part stands alone but builds toward a complete system:
- Part 1: Workspace Structure (you are here)
- Part 2: Documentation Systems
- Part 3: Experiment Tracking
- Part 4: Agent Templates
- Part 5: Ollama Integration
Why Organization Matters for ML Work
Before diving into the structure, let's understand why this matters:
The Problem: ML projects generate a lot of artifacts:
- Downloaded models (5-100+ GB each)
- Training checkpoints (multiple per experiment)
- Datasets in various processing stages
- Experimental code and notebooks
- Results, logs, and visualizations
- Custom models and configurations
The Cost: Without organization:
- 30+ minutes per day searching for files
- Lost experiment context (what were we testing?)
- Duplicate work (did we already try this?)
- Difficulty reproducing results
- Hard to track what actually works
The Solution: A well-designed directory structure that:
- Makes file locations obvious
- Separates concerns cleanly
- Scales to hundreds of projects
- Supports your workflow naturally
- Reduces cognitive load
Core Design Principles
Our workspace structure follows five key principles:
1. Separation of Concerns
Different types of work live in different directories. Ollama models don't mix with fine-tuning experiments. Agents don't mix with datasets. This makes searching faster and reduces cross-contamination.
2. Tool-Specific Organization
Ollama has unique workflows (Modelfiles, pull, create) that deserve dedicated space. Generic ML models go elsewhere. This reflects how you actually work with these tools.
3. Lifecycle Management
Experiments move through states: active → completed → archived. Your directory structure should reflect these lifecycles, keeping your active work focused.
4. Timestamped Naming
Use YYYYMMDD-HHMMSS-descriptive-name for experiments. This provides chronological sorting, unique names, and temporal context at a glance.
5. Discoverability
Names should be intuitive. If you need a dataset, look in datasets/. If you need an Ollama model, look in ollama-models/. No guessing.
The Complete Workspace Structure
Here's the full structure we'll build. Don't worry—we'll break down each section:
(local path)
├── ollama-models/ # Ollama-specific work
│ ├── custom/ # Custom models you created
│ ├── downloaded/ # Models pulled from registry
│ └── modelfiles/ # Modelfile version control
│
├── model-tuning/ # Fine-tuning workspace
│ ├── datasets/ # Training datasets
│ ├── configs/ # Training configurations
│ ├── checkpoints/ # Training checkpoints
│ ├── fine-tuned-models/ # Final outputs
│ └── evaluation/ # Evaluation results
│
├── agents/ # AI agent development
│ ├── prototypes/ # Experimental agents
│ ├── production/ # Production-ready agents
│ ├── libraries/ # Shared components
│ └── tools/ # Agent-specific tools
│
├── datasets/ # Data management
│ ├── raw/ # Original data
│ ├── processed/ # Cleaned data
│ └── annotations/ # Labeled data
│
├── experiments/ # Experiment tracking
│ ├── active/ # Running experiments
│ ├── completed/ # Finished experiments
│ └── archived/ # Historical reference
│
├── notebooks/ # Jupyter development
│ ├── exploration/ # Data exploration
│ ├── analysis/ # Results analysis
│ └── demos/ # Demonstrations
│
├── scripts/ # Automation
│ ├── utilities/ # Helper scripts
│ ├── automation/ # Workflow automation
│ └── deployment/ # Deployment scripts
│
├── results/ # Experiment outputs
├── logs/ # Comprehensive logging
│ ├── experiments/
│ ├── training/
│ └── agents/
│
└── templates/ # Project templates
├── experiment/
└── agent/
Breaking Down Each Directory
Ollama Models Directory
ollama-models/
├── custom/ # Models you create from Modelfiles
├── downloaded/ # Models pulled from Ollama registry
└── modelfiles/ # Version-controlled Modelfiles
Why separate from other models?
- Ollama has specific workflows (pull, create, run)
- Modelfiles need version control
- Custom models need different tracking than downloads
- Easy to see what you've created vs. downloaded
Example usage:
# Pull a model
ollama pull llama3.1
# Create custom model
cd ollama-models/modelfiles/my-assistant/
ollama create my-assistant:v1 -f v1-Modelfile
# Track in downloaded/ and custom/ respectively
Naming convention for custom models:
base-model-purpose-version
Examples:
llama3-customer-support-v1
mistral-code-reviewer-v2
phi3-summarizer-v1
Model Tuning Directory
model-tuning/
├── datasets/ # Training data
├── configs/ # Training configurations
├── checkpoints/ # Model checkpoints during training
├── fine-tuned-models/ # Final trained models
└── evaluation/ # Evaluation metrics and reports
Complete workflow:
- Prepare dataset →
datasets/ - Create config →
configs/ - Train model → checkpoints saved to
checkpoints/ - Evaluate → results in
evaluation/ - Export final →
fine-tuned-models/ - Import to Ollama → create Modelfile, add to
ollama-models/custom/
Why this structure:
- Clear pipeline from data to deployed model
- Checkpoints separate from final models
- Configs version-controlled
- Evaluation results easily comparable
Agents Directory
agents/
├── prototypes/ # Experimental, rapidly changing
├── production/ # Stable, tested, documented
├── libraries/ # Shared code between agents
└── tools/ # Agent-specific tools
Agent organization pattern:
agents/production/customer-support-agent/
├── README.md # Full documentation
├── agent.py # Main code
├── config.yaml # Configuration
├── requirements.txt # Dependencies
├── prompts/ # Prompt templates
├── tools/ # Agent tools
├── tests/ # Tests
└── logs/ # Execution logs
Prototype vs. Production:
- Prototypes: Fast iteration, minimal docs, breaking changes OK
- Production: Stable API, comprehensive docs, tests, versioning
Experiments Directory
experiments/
├── active/ # Currently running
├── completed/ # Done but recent (< 1 month)
└── archived/ # Historical (> 1 month old)
Experiment naming:
YYYYMMDD-HHMMSS-descriptive-name
Examples:
20241019-143000-llama-sentiment-analysis
20241019-155500-phi3-code-generation-benchmark
20241020-090000-custom-model-evaluation
Lifecycle:
- Create in
active/from template - Run experiment, document results
- Move to
completed/when done - After 1 month, move to
archived/
Benefits:
active/stays clean (only current work)completed/for recent referencearchived/keeps history without clutter- Chronological sorting with timestamps
Datasets Directory
datasets/
├── raw/ # Original, unmodified data
├── processed/ # Cleaned, transformed data
└── annotations/ # Labels, tags, metadata
Best practices:
- Never modify
raw/- immutable source of truth - Document processing steps (scripts in
scripts/utilities/) - Symlink large datasets instead of copying
- Use consistent naming:
datasetname_YYYYMMDD_version
Notebooks Directory
notebooks/
├── exploration/ # Initial data exploration
├── analysis/ # Results analysis
└── demos/ # Demonstrations and presentations
Naming convention:
YYYYMMDD-purpose-description.ipynb
Examples:
20241019-exploration-sentiment-dataset.ipynb
20241019-analysis-llama-vs-phi3-performance.ipynb
20241020-demo-custom-model-showcase.ipynb
Scripts Directory
scripts/
├── utilities/ # Helper scripts (data processing, etc.)
├── automation/ # Workflow automation
└── deployment/ # Deployment scripts
Examples:
utilities/convert_data.py- Data format conversionautomation/run_benchmark.sh- Automated benchmarkingdeployment/deploy_agent.py- Agent deployment
Logs Directory
logs/
├── experiments/ # Experiment execution logs
├── training/ # Training logs
└── agents/ # Agent execution logs
Log naming:
YYYYMMDD-HHMMSS-activity.log
Examples:
20241019-143000-sentiment-experiment.log
20241019-155500-model-training.log
20241020-090000-agent-execution.log
Templates Directory
templates/
├── experiment/ # Experiment template
└── agent/ # Agent template
These contain pre-built structures for rapid project creation. We'll cover these in detail in Part 3 (Experiments) and Part 4 (Agents).
Creating the Structure
Here are the commands to create the complete workspace:
# Create main directories
mkdir -p (local path)
# Ollama structure
mkdir -p (local path)
# Model tuning structure
mkdir -p (local path)
# Agents structure
mkdir -p (local path)
# Datasets structure
mkdir -p (local path)
# Experiments structure
mkdir -p (local path)
# Notebooks structure
mkdir -p (local path)
# Scripts structure
mkdir -p (local path)
# Logs structure
mkdir -p (local path)
# Templates structure
mkdir -p (local path)
Verify creation:
cd (local path)
find . -type d | head -30
You should see 30+ directories organized exactly as designed.
Real-World Benefits
After implementing this structure, you'll experience:
Time Saved:
- ~30 minutes per day (no file searching)
- ~15 minutes per new project (template usage)
- ~10 minutes per experiment (clear organization)
Quality Improvements:
- Consistent project structure
- Better documentation (templates prompt it)
- Easier collaboration (others understand structure)
- Reproducible work (everything in its place)
Cognitive Benefits:
- Less decision fatigue ("where should this go?")
- Clear mental model of workspace
- Easier to return after breaks
- Reduced stress from chaos
Design Decisions Explained
Why Timestamp-Based Experiment Naming?
Decision: Use YYYYMMDD-HHMMSS-descriptive-name
Rationale:
- Chronological sorting works naturally with
ls - No naming collisions (even with same description)
- Temporal context visible in name
- Easy to search by date range
Alternative considered: Simple numbering (exp-001, exp-002) Rejected because: Numbers don't convey temporal information
Why Separate Ollama Directory?
Decision: Dedicated ollama-models/ top-level directory
Rationale:
- Ollama has tool-specific workflows
- Modelfiles need special tracking
- Different from generic ML models
- Easy discoverability
Alternative considered: Generic models/ directory
Rejected because: Ollama-specific needs warrant dedicated space
Why Three-Stage Experiment Lifecycle?
Decision: Active/Completed/Archived
Rationale:
- Active: High priority, currently working
- Completed: Done but recent, easy reference
- Archived: Historical record without clutter
Benefits:
- Clean active directory
- Recent work accessible
- History preserved
- Manageable growth
Alternative considered: Just active/completed Rejected because: Completed grows forever without archival stage
Integration with Development Tools
This workspace structure integrates naturally with:
Claude Code / AI Assistants:
- Clear structure guides AI suggestions
- Templates provide context
- Documentation in predictable locations
Git Version Control:
# Initialize in specific subdirectories
cd (local path)
git init
# Or workspace-wide tracking
cd (local path)
git init
Jupyter Notebooks:
- Organized by purpose (exploration/analysis/demos)
- Easy to find related notebooks
- Clear lifecycle (experimental → production)
Ollama CLI:
- Model files in predictable locations
- Version-controlled Modelfiles
- Clear custom vs. downloaded distinction
What's Next
You now have a battle-tested workspace structure that can handle hundreds of ML projects without becoming chaotic. But structure alone isn't enough—you need systems to track your work, document experiments, and capture insights.
In Part 2: Documentation Systems, we'll build a three-tier logging system that:
- Captures execution details for debugging
- Tracks daily activity for review
- Creates session narratives for blog posts
- Maintains experiment history
- Builds a model registry
We'll also create activity tracking systems that turn your ML work into reproducible, shareable knowledge.
Key Takeaways
- Separation of concerns prevents chaos as projects grow
- Tool-specific directories (like
ollama-models/) reflect real workflows - Timestamped naming provides chronological organization
- Lifecycle management keeps active work focused
- Templates (coming in Part 3 & 4) accelerate project creation
- 30 minutes saved daily with proper organization
Resources
Workspace Templates:
- Full structure creation script (provided above)
- Template files (covered in Parts 3 & 4)
Related Reading:
- Ollama Documentation: https://github.com/ollama/ollama
- Git for ML: Version control best practices
- Experiment tracking patterns
Series Navigation
- Next: Part 2: Documentation SystemsshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 2 - Documentation Systems That ScaleBuild a three-tier documentation system that captures ML work for debugging, review, and blog content—turning your experiments into shareable knowledge.
- Series Home: Building a Production ML Workspace on GPU Infrastructure
Questions or suggestions? Find me on Twitter @bioinfo or at rundatarun.io
Related Articles
- Building a Production ML Workspace: Part 3 - Experiment Tracking and ReproducibilityshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 3 - Experiment Tracking and ReproducibilityBuild systematic experiment tracking with templates, progress monitoring, and lifecycle management to ensure every ML experiment is reproducible and builds toward knowledge.
- Building a Production ML Workspace: Part 2 - Documentation Systems That ScaleshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 2 - Documentation Systems That ScaleBuild a three-tier documentation system that captures ML work for debugging, review, and blog content—turning your experiments into shareable knowledge.
- Building a Production ML Workspace: Part 5 - Team Collaboration and Workflow IntegrationshippedPractical ApplicationsOct 19, 2025Building a Production ML Workspace: Part 5 - Team Collaboration and Workflow IntegrationComplete your production ML workspace with team collaboration patterns, workflow automation, version control strategies, and integration frameworks that scale.
About the Author: Justin Johnson builds AI systems and writes about practical AI development.
justinhjohnson.com | Twitter | LinkedIn | Run Data Run | Subscribe
Follow the lab
Get the next experiment
Enjoyed the breakdown on Building a Production ML Workspace: Part 1 - Designing an Organized Structure? New entries land roughly weekly. No digest, no roundup. Just the next build log, when it ships.
Related experiments
- Practical ApplicationsBuilding a Production ML Workspace: Part 2 - Documentation Systems That Scale
- Practical ApplicationsBuilding a Production ML Workspace: Part 4 - Production-Ready AI Agent Templates
- Practical ApplicationsBuilding a Production ML Workspace: Part 5 - Team Collaboration and Workflow Integration
Apparatus
1,365 words · 8 min read
- workspace-organization
- ml-development
- ollama
- dgx
- gpu-computing
- best-practices
Links to this entry
- Building a Production ML Workspace: Part 2 - Documentation Systems That Scale
- Building a Production ML Workspace: Part 3 - Experiment Tracking and Reproducibility
- Building a Production ML Workspace: Part 4 - Production-Ready AI Agent Templates
- Building a Production ML Workspace: Part 5 - Team Collaboration and Workflow Integration
- DGX Lab: Supercharge Your Shell with 50+ ML Productivity Aliases - Day 2
- DGX Spark Benchmarks: 82,739 tokens/sec on Paper, the Production Reality
- My AI Linux Expert: How Claude Code Suggested a 95,000x Faster Solution