defending-homelab-npm-supply-chain - AIXplore

# Defending Your Homelab and Agent Fleet From npm Supply-Chain Attacks ![[defending-homelab-npm-supply-chain-hero.png]] Yesterday, May 12, 2026, the TanStack maintainer accounts were compromised. Several packages in the ecosystem got malicious versions pushed under existing names. The community has been calling it Mini Shai-Hulud, a smaller-scoped cousin of the worm-style attacks that hit npm last year. The payload was the familiar shape: a postinstall script that scraped environment variables, exfiltrated to a remote host, and tried to plant itself into adjacent repos through the same maintainer's credentials. If you run a CI pipeline, an agent fleet, or any machine that did an `npm install` in roughly the twelve hours before the packages were yanked, you should be treating that host as suspect until you've checked it. Not panicked, but checked. This post is about what we built in the twelve hours after the incident broke. A nightly monitor that watches fifteen heterogeneous hosts, scopes every report to what changed since yesterday, and stays quiet on days when nothing moved. The npm-specific pieces are the new bit. The shape (fan-out, snapshot, diff, score, one report) generalizes to anything you can list. ## Why vibe coding makes the audit surface worse A few years ago, the typical Node project's `package.json` was small enough that a careful engineer could read it. The transitive tree was bigger, but you mostly trusted the top-line names because you'd put them there yourself. Vibe coding broke that. When Claude Code or Codex scaffolds a new MCP server, you accept the dependencies it picks. When an agent skill says "you need `puppeteer-extra-plugin-stealth` for this scrape", you accept. When you bolt a Remotion render farm onto a Node service, the install pulls a few thousand packages and you don't even see most of them scroll by. The productivity gain is large, but every accepted suggestion enlarges the attack surface by an amount no one is measuring. Multiply that by a fleet. A typical setup has a developer workstation, a media-rendering machine, a GPU server, a small always-on home server, a handful of cloud VMs for autonomous agents, and a hosting VPS for public sites. Each one has its own mix of Node-based MCP servers, skill scripts, Remotion projects, and ad-hoc utilities. The set of npm packages installed across all of them is large enough that "audit it manually" was never the answer. So the question we started from yesterday morning was: given that you can't read everything, and given that supply-chain compromises are now a steady drumbeat, what's the minimum-effort monitoring that moves the risk? ## What the fleet looks like before the monitor Fifteen hosts, three operating systems, four package managers if you count Homebrew. Node lives in roughly a dozen places per host: one or two global installs, the user's `~/.npm` cache, then a tree per project. On the busiest host alone there are over forty active projects with their own `node_modules`. Snapshotting everything every night sounds expensive. It isn't, but only if you do it right. The naive answer is "run `npm audit` everywhere on a cron." That fails for several reasons. `npm audit` only flags packages with known CVEs, which means it tells you about yesterday's news, not today's. It produces dense JSON output per project that no one reads. And it gives you zero correlation across hosts, so you can't see that the same suspicious package appeared on three machines within an hour. The real answer is incremental and correlated. Take a snapshot per host, diff against yesterday, score the diff, and emit one report that names hosts and packages together. ## The twelve-hour build The architecture has four pieces. **Inventory.** A single YAML file is the source of truth for which hosts exist and how to reach them. Hostnames, SSH config aliases, and a per-host list of paths to scan (`~/`, `~/apps/`, `~/scripts/`). Everything downstream reads from this file. New host = one entry, no other changes. **Snapshot.** On each host, the nightly job walks the configured paths, finds every `package.json` not buried inside another `node_modules`, and runs `npm ls --all --json --long=false` against each. The output gets normalized into a flat record per package: name, version, resolved URL, integrity hash, install path. The whole snapshot for one host serializes to a gzipped JSON file, typically two to ten megabytes depending on how much Node lives there. Snapshots are kept for thirty days on the host and synced back to a central directory on one designated host for cross-host queries. ```python import json import subprocess from pathlib import Path def snapshot_project(project_dir: Path) -> list[dict]: """Run npm ls and flatten the dependency tree into per-package records.""" result = subprocess.run( ["npm", "ls", "--all", "--json", "--long=false"], cwd=project_dir, capture_output=True, text=True, timeout=120, ) # npm ls exits non-zero on peer dep warnings; the JSON is still valid tree = json.loads(result.stdout or "{}") return list(walk(tree, project_dir)) def walk(node: dict, project_dir: Path, path: tuple = ()): for name, info in (node.get("dependencies") or {}).items(): yield { "project": str(project_dir), "name": name, "version": info.get("version"), "resolved": info.get("resolved"), "integrity": info.get("integrity"), "path": " > ".join(path + (name,)), } yield from walk(info, project_dir, path + (name,)) ``` **Fan-out.** A wrapper on the central host reads the inventory, kicks off the snapshot script on each host in parallel over SSH, and pulls results back. Timeouts are aggressive because the price of one slow host blocking the others is worse than the price of skipping it. If a host is unreachable, the report says so explicitly rather than silently dropping it. ```bash #!/usr/bin/env bash set -euo pipefail INVENTORY=~/scripts/fleet-inventory.yaml SNAPSHOT_DIR=~/snapshots/npm DATE=$(date +%Y-%m-%d) mkdir -p "$SNAPSHOT_DIR/$DATE" for host in $(yq '.hosts[].alias' "$INVENTORY"); do ( timeout 300 ssh -o ConnectTimeout=10 "$host" \ "~/scripts/npm-snapshot.py" 2>&1 \ | gzip > "$SNAPSHOT_DIR/$DATE/$host.json.gz" \ || echo "[FAIL] $host" >> "$SNAPSHOT_DIR/$DATE/errors.log" ) & done wait ``` **Diff and score.** This is where most of the design effort went. Diffing two large dependency trees naively produces hundreds of changes per host per night, most of which are noise (a transitive dep nudged a patch version). What you want is not the raw diff but the scored diff. ```python SCORES = { "known_bad": 100, # in current IOC feed "new_maintainer": 40, # ownership changed in last 30 days "published_recently": 20, # version released in last 72 hours "major_version_jump": 15, "new_top_level_dep": 10, "transitive_patch_bump": 0, # silent } def score_change(change: dict, registry_meta: dict) -> int: name, new_ver = change["name"], change["new_version"] meta = registry_meta.get(name, {}) score = 0 if name in IOC_FEED: score += SCORES["known_bad"] if meta.get("maintainer_changed_within_days", 999) <= 30: score += SCORES["new_maintainer"] if meta.get("published_hours_ago", 999) <= 72: score += SCORES["published_recently"] if change["kind"] == "major_bump": score += SCORES["major_version_jump"] if change["kind"] == "added" and change["depth"] == 1: score += SCORES["new_top_level_dep"] return score ``` Anything scoring 40 or higher goes into the report as a flagged change. Anything between 10 and 39 goes into a "context" section that's collapsible. Anything below 10 doesn't appear at all. On a normal night, the report is half a page and the channel ping is suppressed entirely. The IOC feed is two sources stitched together: the public OSV database for npm, and a hand-maintained list of current-incident indicators (right now, the TanStack package set and the exfiltration domains from the postinstall script). The feed refreshes once a day from a known-good source and is itself versioned in a separate repo so the monitor can't be silently poisoned by a compromise of the feed. ## Why nightly, not on install The first instinct when you read about a supply-chain attack is to want detection at install time, before the malicious code runs. That instinct is wrong, or at least incomplete. The install already happened on hundreds of machines around the world before anyone knew the package was bad. What you want to know is: among the things now running on my hosts, did anything change recently in a way that matches the malware-publish pattern? Nightly is the right cadence for that question. The signals you care about (a new publish, a maintainer rotation, a fresh version of a stable package) settle within hours on the npm registry. By the time a 2 AM cron runs, the registry metadata is stable enough to score against. Running every fifteen minutes would catch the publish slightly faster but at the cost of more false-positive churn from in-flight version bumps and incomplete metadata. The tradeoff is some hours of exposure for a much quieter signal. This is the same logic from a related piece on [[Blog/AIXplore/Practical Applications/when-launchagents-attack-100-dollar-api-crash-loop|building cost monitoring after a $100 LaunchAgent crash loop]]. The frequency you sample at should match the natural settling time of the signal, not the frequency of the underlying events. ## High signal or silent The hardest part of any monitoring system isn't the data collection. It's deciding what's worth waking someone up for. If the channel pings every morning with "47 packages updated across the fleet", you stop reading the channel within a week. If it pings only when something scores 40 or above, you read every ping. The scoring rubric is the place where you encode your tolerance for noise. Ours is deliberately aggressive on the silent side. A patch bump of a transitive dep with no other signals scores zero, full stop. We don't even render it in the report. A new top-level dependency added by an agent during a vibe-coding session scores 10, which means it shows up in the context section but doesn't ping. A package that's both new-to-the-fleet and was published in the last 72 hours scores 30, which still doesn't ping but does get read. Only the combinations that match known attack patterns trip the threshold. This is the same rule that governs the [[Blog/AIXplore/Practical Applications/concentric-system-analysis-with-claude|40-agent system hardening pass]]: emit findings when there's a finding, stay silent when there isn't. A monitor that reports "all green" trains its operator to ignore it. A monitor that goes quiet for three weeks and then pings once with a real lead is the one you trust. ## What the first 24 hours caught The monitor went green at 2 AM this morning across all fifteen hosts. The first run produced exactly two flagged changes, both false positives on inspection, and a context section with eleven entries that I read over coffee in about two minutes. The two flags: a stylelint plugin that had legitimately changed maintainers within the 30-day window (verified by reading the package's GitHub repo and confirming the handoff was announced), and a TypeScript type-definition package that had been republished after a takeover dispute (also legitimate, also resolved). Both took under five minutes to clear. Neither would have shown up in a `npm audit` because neither has a CVE; they showed up here because of the signal combination, not because anyone had flagged them. The context section was the more interesting read. Two hosts had picked up new direct dependencies from agent activity in the last 24 hours that I hadn't consciously approved. Both turned out to be reasonable (a JSON-schema validator and a small CLI helper), but the fact that they appeared in the report was the right outcome. Vibe coding produces drift, and the monitor surfaces the drift without forcing me to read every commit. ## What this doesn't solve A few things sit outside this design and deserve to be named. Transient malware that publishes, gets installed, then either self-destructs or gets yanked by the attacker before 2 AM never shows up. The diff at 2 AM sees only the cleaned state. The only defense against that pattern is install-time scanning, which is a different and much more invasive piece of infrastructure. Compromises that don't change version (registry-level account takeovers that overwrite an existing tag) are rare but possible. The integrity hash in `npm ls` would change in that case, and the snapshot captures the hash, so a future version of the scorer should flag "same version, different integrity" as a high-score signal. Not in v1. PyPI, Cargo, Homebrew, GitHub Actions, and Claude Code MCP servers all live outside this monitor. Each has the same shape of problem and roughly the same shape of solution (list, diff, score, report). PyPI is next on the build queue, probably this weekend. The skill manifest scan is third. And the monitor itself is a piece of software running on every host, which means it has its own supply chain. The Python script depends on `pyyaml` and `requests`, both pinned and integrity-hashed. The IOC feed is checked against a published GPG signature before being trusted. Belt and suspenders, because the worst failure mode for a security monitor is being the attack vector. ## The pattern generalizes The npm-specific pieces of this build are the IOC feed format, the registry-metadata lookup, and the scoring weights. Everything else is shape that transfers to any other ecosystem where you can produce a deterministic list of what's installed. For PyPI: `pip list --format=json` plus a `pip show` walk for metadata, scored against OSV's PyPI feed. For Homebrew: `brew info --json=v2 --installed`, scored against deprecation and replacement signals. For Cargo: `cargo install --list` plus crates.io metadata. For your agent fleet's own surface area: `claude mcp list` plus the MCP server's package source, scored against whether the server pulled new transitive deps during last night's npm install. The discipline is the same in every case. One source of truth for the inventory. A daily snapshot taken on each host. A diff that produces a small number of human-readable changes. A score that decides what's worth interrupting you for. One report, one channel, one threshold below which the channel stays silent. Supply-chain attacks are not going to slow down. The npm ecosystem in particular has fundamentally re-shaped itself around the assumption that every developer is also a publisher, and every dependency tree contains code from people you'll never meet. Defending against that doesn't require giving up the productivity. It requires noticing the changes, scoring them with discipline, and trusting your channel by keeping it quiet most of the time. --- ### Related Articles - [[Blog/AIXplore/Practical Applications/concentric-system-analysis-with-claude|Hardening a Production System with 40 Parallel AI Agents]] - [[Blog/AIXplore/Practical Applications/when-launchagents-attack-100-dollar-api-crash-loop|When LaunchAgents Attack]] - [[Blog/AIXplore/Practical Applications/my-personal-ai-assistant-clawdbot-seneca|My Personal AI Assistant Lives Everywhere]] --- About the Author: Justin Johnson builds AI systems and writes about practical AI development. <a href="https://justinhjohnson.com">justinhjohnson.com</a> | <a href="https://twitter.com/bioinfo">Twitter</a> | <a href="https://www.linkedin.com/in/justinhaywardjohnson/">LinkedIn</a> | <a href="https://rundatarun.io">Run Data Run</a> | <a href="https://subscribe.rundatarun.io">Subscribe</a>