# Hardening a Production System with 40 Parallel AI Agents
I needed to understand my system. Not surface-level monitoring, complete visibility. Security vulnerabilities, configuration drift across three machines, performance bottlenecks, integration failures, everything that could go wrong quietly over months of incremental changes.
Manual audit? Impossible. Too many layers. Too many interconnected services. Automation scripts calling MCP servers calling remote systems calling databases. 100+ shell scripts. 44 LaunchAgents. 14 MCP servers. 5 skills. Three machines (Mac, GPU server, Raspberry Pi).
The approach: 40 specialized AI agents running in parallel. Concentric exploration from core infrastructure outward. Four waves. Each agent with domain expertise (security scanning, database architecture, network configuration, cost analysis).
The agents found critical issues hiding in plain sight. Multiple VPN daemons fighting for network control. Boot time killed by forgotten services. Hook duplication costing 600ms per tool use. Configuration drift between machines. Attack surface from 81 apps with accessibility permissions.
This was system hardening at scale.
## The Concentric Exploration Pattern
The approach was concentric: start at the core (system performance, running processes) and expand outward in waves until you've covered the entire system.
**Wave 1: Core Infrastructure** (10 agents)
- System performance metrics
- Storage usage patterns
- Running services and daemons
- MCP server health
- Active skills and automation
- Network configuration
- Docker containers
- Python environments
- Database inventory
- Obsidian vault integration
**Wave 2: Security & Remote Systems** (10 agents)
- Security vulnerabilities
- Remote system health (SSH to GPU server and Raspberry Pi)
- Script inventory and dependencies
- API cost analysis
- Database health checks
- Log pattern analysis
- Configuration drift detection
- Integration testing
- Performance profiling
- Git repository inventory
**Wave 3: Application Ecosystem** (10 agents)
- Homebrew package audit
- Shell configuration analysis
- macOS integrations (Shortcuts, Automator)
- Cloud service status
- Application config sizes
- Certificate management
- Network topology
- Browser extensions
- Node.js ecosystem
- Font and asset inventory
**Wave 4: Deep Application Analysis** (10 agents)
- VSCode/Cursor configuration
- Media and downloads cleanup
- Terminal configurations
- System modifications audit
- Development tool versions
- System log security scan
- Temporary file analysis
- Privacy permission audit
- Startup performance breakdown
Each wave built on the previous one. By the time we reached Wave 4, the agents had context from earlier discoveries and could make connections across the system.
## Parallel Agent Architecture
The key to making this work: run everything in parallel. Claude Code supports concurrent agent execution. Instead of waiting for one agent to finish before starting the next, I launched 10 agents simultaneously for each wave.
```python
# Conceptual structure (actual implementation uses Claude Code's Task tool)
agents = [
Task("homelab-monitor", "Check system performance", model="haiku"),
Task("parallel-explorer", "Map storage usage", model="haiku"),
Task("security-scanner", "Audit vulnerabilities", model="haiku"),
Task("database-architect", "Analyze database health", model="haiku"),
# ... 6 more agents
]
# All start at once, return when complete
results = await gather(*agents)
```
Each agent ran independently. The `homelab-monitor` agent checked CPU, memory, disk, and thermal state while the `parallel-explorer` agent scanned directory structures. The `security-scanner` looked for exposed credentials while `database-architect` analyzed SQLite files and vector databases.
Total wall-clock time for 10 agents: about the same as running one agent. Parallelism matters.
> [!tip] Model Selection Strategy
> Use `haiku` for fast, straightforward tasks (inventory, grep, file stats). Use `sonnet` for analysis that requires reasoning. Use `opus` for planning and synthesis. The right model for the task saves both time and cost.
## Agent Specialization
The agents weren't generic. Each had a specific role and the tools to match.
**homelab-monitor**: System health checks across multiple machines
- Tools: SSH access, system metrics, service status
- Output: Performance bottlenecks, resource usage, thermal state
**security-scanner**: Vulnerability detection and credential exposure
- Tools: File search, regex patterns, permission checks
- Output: Exposed secrets, weak configurations, attack surface analysis
**database-architect**: Database health and optimization
- Tools: SQLite analysis, schema inspection, query profiling
- Output: Fragmentation reports, bloat detection, optimization recommendations
**cost-guardian**: API spend tracking and anomaly detection
- Tools: Log analysis, billing APIs, usage pattern recognition
- Output: Cost breakdown by service, optimization opportunities
**parallel-explorer**: Fast file discovery and pattern matching
- Tools: Glob, grep, size calculations
- Output: Directory inventories, large file identification, duplicate detection
The specialization meant each agent could go deep in its domain without getting distracted. The `security-scanner` didn't need to understand database schemas. The `cost-guardian` didn't need to know about filesystem layouts.
## What the Agents Found
The 40 agents surfaced critical issues across every layer. Not just performance problems. Security vulnerabilities, architectural conflicts, and silent failures.
### VPN Daemon Conflict
**Finding**: Two VPN daemons running simultaneously as root (NordVPN and Cloudflare WARP), both intercepting network traffic.
The `security-scanner` agent found both daemons configured with `KeepAlive=true`, fighting for DNS control. Cloudflare WARP was crash-looping with exit code 78 (configuration error), attempting restart every boot cycle. NordVPN was loaded but unused.
Impact: Network routing conflicts, resource waste, dual attack surface for traffic interception. Two privileged daemons with complete network visibility when only one (Tailscale) was actually needed.
The agent didn't just flag the conflict. It traced the configuration history, identified which daemon was actually in use, and recommended complete removal of the unused services to eliminate the attack surface.
### Excessive System Permissions
**Finding**: 81 apps with Accessibility permissions, 24 with AppleEvents control, 2 unexpected apps with camera access.
The `security-scanner` found Warp Terminal and Raycast both had camera permissions (neither needs camera access). It identified 60+ apps with Accessibility permissions that could monitor every keystroke and window on the system.
Each permission multiplies attack surface. A compromised app with Accessibility access can keylog credentials. An app with AppleEvents can control other applications silently.
The agent calculated the actual risk: 60 unnecessary permissions that should be revoked. It identified which apps legitimately needed access (terminal emulators, automation tools) versus which had permissions from one-time use cases years ago.
### Boot Time Cascade Failures
**Finding**: 52 startup items, 9 with `KeepAlive=true` running permanently, boot time 60-88 seconds.
The `parallel-explorer` agent traced the full dependency chain. Grammarly spawning 4 separate agents. Ollama loading a 600MB language model server whether I needed it or not. Facebook Watchman monitoring filesystem changes for development tools that weren't even running.
But the real issue: cascade conflicts. Docker vmnetd waiting for network. Cloudflare WARP failing and retrying. OneDrive dual-daemon redundancy. Each service adding latency, some blocking others.
Combined impact: 35% of boot time spent on services that either failed, conflicted, or weren't needed at startup.
### Configuration Drift Across Machines
**Finding**: Three machines (Mac, GPU server, Pi) with divergent configurations despite documented sync automation.
The `parallel-explorer` agent compared configs across systems via SSH. Found settings.json with different environment variables, MCP server inventories out of sync, skills present on Mac but missing on remotes.
The sync automation existed. It ran weekly. But it wasn't preserving machine-specific settings correctly, causing drift over time.
Impact: Skills that worked on Mac failed silently on GPU server. Automation that depended on remote execution broke unpredictably. No visibility into which machine had which capabilities.
### Missing Critical Services
**Finding**: Backup system disabled for 30+ days, vector database unavailable, three automation skills documented but missing from filesystem.
The `homelab-monitor` agent found the Mac's B2 encrypted backup job disabled (`.plist.disabled` suffix). The vector database sync was blocked because the GPU server was offline from a power outage. Three skills referenced in documentation didn't exist in the skills directory.
This is the kind of silent degradation that happens in complex systems. Services stop working. Dependencies break. Nobody notices until they need them.
The agent didn't just report "backup disabled." It traced the dependency chain: backup job depends on Restic, which depends on credentials in `~/.secrets/backblaze.env`, which depends on network access to B2. It verified each step and identified exactly where the failure occurred.
### Hook Duplication Cascading Latency
**Finding**: Claude Code settings with 8x duplicate notification hooks, 6x duplicate formatting hooks, 3x duplicate WebSearch hooks.
The `parallel-explorer` agent parsed the JSON configuration and detected the duplication pattern. Every tool completion triggered 8 identical osascript notifications. Every file edit ran prettier 6 times. Every web search injected the current year 3 times.
Impact: 400-600ms overhead per tool use. Multiply that by hundreds of daily tool invocations and you're adding minutes of latency to every work session.
The fix: consolidate hooks to single instances. But without the agent systematically comparing hook definitions, I never would have noticed identical configurations scattered through the file.
## The Synthesis Phase
After all agents completed, the real work began: synthesizing 40 independent reports into coherent findings.
I used a `solution-planner` agent with the `opus` model for this. It read all 40 agent outputs, identified patterns, cross-referenced issues, and built a dependency graph of problems.
For example:
- Memory pressure (35GB/36GB usage) connected to redundant services running at boot
- Boot time issues connected to the number of KeepAlive LaunchAgents
- Disk space issues connected to database version retention and Python duplicates
- Integration failures connected to offline remote systems (GPU server and Pi were down)
The synthesis agent grouped findings by priority (critical, high, medium, low) and created a 4-week optimization plan with specific commands, validation steps, and expected impact.
Two documents generated:
1. **System Analysis** (7,800 lines): Complete findings across all domains
2. **Optimization Plan** (1,000+ lines): Week-by-week tactical remediation
## Lessons on Agent Orchestration
### Parallel by Default
The single biggest performance gain: run agents in parallel unless there's a true dependency.
I initially ran agents sequentially. Wave 1 took 45 minutes. Then I switched to parallel execution (10 agents running simultaneously). Wave 2 took 8 minutes.
The agents were I/O bound (reading files, running commands, parsing logs), not CPU bound. Parallelism meant they could all progress while waiting on filesystem operations.
### Model Selection Matters
Using `haiku` for inventory tasks and `sonnet` for analysis saved both time and cost.
The `parallel-explorer` agent running on `haiku` cost $0.15 to scan 100,000+ files. The same agent on `sonnet` would have cost $3.50. The quality difference for "list all .log files over 100MB" is negligible.
Save the expensive models for tasks requiring reasoning: security analysis, pattern recognition, architectural recommendations.
### Specialization Over Generalization
General-purpose agents get distracted. Specialized agents stay focused.
I tried a "full system audit" agent initially. It produced shallow findings across many domains but missed the deep issues. When I split it into specialized agents (security-scanner, database-architect, cost-guardian), each one went deeper in its area and surfaced specific, actionable problems.
### Context Handoff Between Waves
Later waves benefited from earlier findings. The security-scanner in Wave 2 used the service inventory from Wave 1 to check which daemons had unnecessary privileges. The cost-guardian in Wave 3 used the integration test results from Wave 2 to identify failed API calls.
This requires explicit context passing. Each wave's summary became input for the next wave's planning.
## Implementation in Claude Code
The actual implementation uses Claude Code's `Task` tool with specialized agent types:
```bash
# Launch 10 agents in parallel (single message, multiple tool calls)
Task(subagent_type="homelab-monitor", prompt="Check system performance...", model="haiku")
Task(subagent_type="parallel-explorer", prompt="Map storage usage...", model="haiku")
Task(subagent_type="security-scanner", prompt="Audit vulnerabilities...", model="haiku")
# ... 7 more
```
Each agent runs independently and returns when complete. The orchestrator waits for all agents to finish before proceeding to the next wave.
The agent types available:
- `homelab-monitor`: Infrastructure health
- `parallel-explorer`: Fast file discovery
- `security-scanner`: Vulnerability detection
- `database-architect`: Database optimization
- `cost-guardian`: API spend analysis
- `search-specialist`: Web research
- `code-reviewer`: Code quality
- `debugger`: Error investigation
- `batch-editor`: Multi-file refactoring
Mix and match based on what you need to analyze.
## The Results
**Storage**: 8-10GB disk space recoverable across multiple categories
- LanceVectorDB versions: 1.5-2.0GB
- Python duplicates: ~2GB
- Homebrew cache: 2-3GB
- Browser caches: 1.5GB
- Temporary files: 600MB
**Performance**: Boot time reduction from 60-88s to 45s (35% improvement)
- Remove Grammarly LaunchAgents: -3-5s
- Defer Ollama to on-demand: -5-8s
- Fix Cloudflare WARP crash loop: -3-5s
- Remove unused services: -4-5s
**Cost**: API spend reduction from $7.69 to $3.25/month (58% reduction)
- Switch meeting notes to Haiku: -$2.97/month
- Reduce polling frequency: -$0.86/month
- Optimize B2 compression: -$0.70/month
**Configuration**: Settings optimization
- Hook consolidation: -400-600ms per tool use
- MCP server startup: -5-30s at session start
- VSCode accessibility: +5-10% startup performance
The agents found issues I never would have discovered manually. Database version bloat? Hook duplication? Python 3.10 still installed? These aren't the kinds of problems you notice. They accumulate silently over time.
## When to Use This Approach
Concentric analysis with parallel agents makes sense when:
**You have complex, interconnected systems**: Multiple services, databases, integrations, automation. Not just a single application.
**You want comprehensive coverage**: Security, performance, cost, configuration. All domains at once rather than point solutions.
**You can define specialization**: You know what questions to ask and can route them to specialized agents. This isn't a black box "audit everything" tool.
**You have time to synthesize**: 40 agent reports don't automatically become actionable insights. Budget time for synthesis and prioritization.
It doesn't make sense for:
- Simple systems with obvious issues (just fix them)
- Single-domain problems (use one specialized agent)
- Systems without clear ownership (who will act on findings?)
## The Code
The full orchestration script is available in my homelab automation repository. Key components:
```bash
#!/bin/bash
# concentric-analysis.sh
# Wave 1: Core Infrastructure
echo "Launching Wave 1: Core Infrastructure (10 agents)"
claude-code task --parallel \
homelab-monitor="Analyze system performance" \
parallel-explorer="Map storage usage" \
security-scanner="Check for vulnerabilities" \
# ... 7 more agents
# Wait for completion, then Wave 2
echo "Launching Wave 2: Security & Remote Systems (10 agents)"
# ... repeat pattern
# Synthesis
echo "Synthesizing findings with solution-planner agent"
claude-code task \
solution-planner="Analyze all 40 agent reports and create optimization plan" \
model=opus
```
The actual implementation is more sophisticated (error handling, progress tracking, agent resume on failure), but the core pattern is: define waves, launch agents in parallel, synthesize results.
## Takeaways
**Parallel execution transforms agent utility**. One agent scanning your system takes time. Ten agents running simultaneously cover 10x the ground in the same time.
**Specialization beats generalization**. Domain-specific agents with focused tools find deeper issues than general-purpose audits.
**Synthesis is where value emerges**. Raw findings from 40 agents aren't actionable. The synthesis step (connecting patterns, prioritizing by impact, building remediation plans) is where the work becomes useful.
**Model selection matters for cost**. Using `haiku` for straightforward tasks and reserving `sonnet`/`opus` for reasoning-heavy work makes this economically viable. Running 40 `opus` agents would cost ~$50. Running 40 `haiku` agents with selective `sonnet`/`opus` synthesis costs ~$5.
**The methodology scales**. Four waves on one Mac. Could be four waves on a distributed system. Could be eight waves on a complex infrastructure. The pattern holds: concentric expansion, parallel execution, specialized agents, synthesis.
I went from "my Mac feels slow" to "here are 21 specific issues ranked by priority with exact remediation steps" in a few hours. The agents did the exploration. I did the synthesis and decision-making.
That's the right division of labor.
---
> [!info] Full Analysis Available
> The complete system analysis (7,800 lines) and optimization plan (1,000+ lines) are documented in my homelab vault. If you're interested in the detailed findings or want to adapt the methodology for your own infrastructure, the approach is fully transferable to any Unix-based system.
---
### Related Articles
- [[making-claude-code-more-agentic|Making Claude Code More Agentic: Parallel Execution, Model Routing, and Custom Agents]]
- [[debugging-claude-code-with-claude|Debugging Claude Code with Claude: A Meta-Optimization Journey]]
- [[when-launchagents-attack-100-dollar-api-crash-loop|When LaunchAgents Attack: A $100 API Crash Loop Story]]
---
<p style="text-align: center;"><strong>About the Author</strong>: Justin Johnson builds AI systems and writes about practical AI development.</p>
<p style="text-align: center;"><a href="https://justinhjohnson.com">justinhjohnson.com</a> | <a href="https://twitter.com/bioinfo">Twitter</a> | <a href="https://www.linkedin.com/in/justinhaywardjohnson/">LinkedIn</a> | <a href="https://rundatarun.io">Run Data Run</a> | <a href="https://subscribe.rundatarun.io">Subscribe</a></p>