prompt-build-ai-landscaping-skill - AIXplore

# Prompt for Claude Code: Build AI Landscaping Skill Copy the prompt below and paste it into Claude Code to build your own AI research intelligence skill. --- ## Objective Create a scalable AI landscaping skill for ongoing research and intelligence gathering. The skill must handle thousands of documents over time, prevent duplicate research, and enable efficient retrieval through indexing and full-text search. ## Core Requirements ### 1. File Structure Create this exact structure: ``` ai-landscaping/ ├── SKILL.md ├── research/ │ ├── INDEX.md # Master index with one-line summaries │ ├── ARCHIVE.md # Track what we've already researched │ ├── metadata.db # SQLite for structured queries │ └── YYYY-MM/ │ ├── YYYY-MM-DD/ │ │ ├── models.md # Daily model findings │ │ ├── papers.md # Daily paper findings │ │ ├── tools.md # Daily tools/platforms │ │ ├── comparisons.md # Daily comparative analyses │ │ └── meta.json # Structured metadata for the day │ └── monthly-summary.md # Month synthesis ├── scripts/ │ ├── init_research_day.py # Create today's research structure │ ├── search_research.py # Ripgrep wrapper with filters │ ├── update_index.py # Append to INDEX.md │ ├── check_duplicate.py # Check if already researched │ ├── query_db.py # SQLite queries │ └── generate_monthly_summary.py # Synthesize month's research └── references/ ├── search-strategies.md # Daily search patterns └── taxonomy.md # Classification system ``` ### 2. SKILL.md Contents Create a SKILL.md file with: **Frontmatter:** ```yaml name: ai-landscaping description: AI research intelligence gathering and retrieval system. Use when the user wants to research AI models, papers, tools, or platforms; store findings persistently; search through past research; or get daily AI landscape updates. Prevents duplicate research and enables efficient retrieval across thousands of documents. ``` **Body sections:** - **Overview**: Purpose and capabilities - **Daily Research Workflow**: - Check ARCHIVE.md to avoid duplicates - Execute high-signal searches (see strategies below) - Store findings in today's directory - Update INDEX.md with one-line summaries - Update metadata.db - Mark researched items in ARCHIVE.md - **Search & Retrieval Workflow**: - Quick scan: Check INDEX.md - Recent check: View last 7 days of research - Full search: Use `search_research.py` with ripgrep - Structured query: Use `query_db.py` for metadata - **Anti-Duplication Strategy**: Always check ARCHIVE.md before research - **File Formats**: Standardized markdown templates - **Progressive Disclosure**: Reference scripts/ and references/ files ### 3. Search Strategies (references/search-strategies.md) Create intelligent daily searches that maximize signal without repetition: ```markdown ## High-Signal Daily Searches ### Models Research **Morning Scan (10-15 min research time):** 1. Trending models (exclude already archived): - Hugging Face: Top 10 trending models from last 24h - Filter: trendingScore, downloads spike >50% week-over-week - Check: `rg "model_id" research/ARCHIVE.md` before adding 2. New releases in key categories: - Vision: image-to-video, image-generation (check weekly) - LLM: text-generation >7B params (check when new) - Multimodal: Recent vlm/llm combinations - Query: `created_at > [yesterday] AND (task:image-generation OR task:text-generation)` 3. Notable updates to tracked models: - Check models in ARCHIVE.md that have `watch: true` flag - Look for new versions, significant download spikes ### Papers Research **Daily Academic Intelligence:** 1. High-impact papers (arxiv/HF): - Papers with >50 citations in first week - Papers from top labs (OpenAI, Anthropic, DeepMind, Meta, etc.) - Query: `author:openai OR author:anthropic` + last 48h 2. Emerging concepts: - Track frequency of terms: "discrete diffusion", "world models", "distillation" - Monthly: `rg -c "world models" research/2025-11/` to see trend - Only research papers on NEW concepts not in ARCHIVE.md 3. Implementation-ready papers: - Papers with associated HF models/code - Reproducibility score >3/5 ### Tools & Platforms **Weekly Deep Dive (pick 1-2 areas/week):** 1. Infrastructure: New ML frameworks, deployment tools 2. Evaluation: Benchmarks, leaderboards, quality metrics 3. Productivity: IDE integrations, code assistants 4. Governance: Safety tools, alignment research **Daily Quick Check:** - HF Spaces with MCP support (new integrations) - GitHub trending in "machine-learning" (stars >500/day) ### Comparisons **When to create comparative analyses:** 1. Multiple models solving same task released within 7 days 2. Significant paradigm shifts (e.g., new architecture outperforms) 3. User requests specific comparison 4. Monthly meta-analysis of category ## Anti-Duplication Filters Before ANY search: ```bash # Check if already researched python scripts/check_duplicate.py "model:Flux-1" python scripts/check_duplicate.py "paper:Attention Is All You Need" ``` Search syntax for avoiding duplicates: ```bash # Exclude models already in archive rg "model_id" research/ARCHIVE.md --files-without-match # Find gaps in coverage # (models with >10k downloads but not in our research) ``` ## Daily Routine Template **Monday-Friday (15-20 min):** 1. Trending models (top 5 new) 2. Key papers (top 3 from top labs) 3. One tool/platform deep dive (rotate) **Weekend:** 1. Generate weekly summary 2. Create comparisons for related findings 3. Update taxonomy.md with new categories ## Signal Quality Heuristics **High Signal:** - New model from established org with novel capability - Paper with >3 citations/day in first week - Tool that integrates with existing workflow - Direct applicability to your domain **Low Signal (skip):** - Incremental improvements (<5% on benchmarks) - Me-too models without differentiation - Papers without code/reproducibility - Tools duplicate existing capabilities ## Search Query Examples ### Hugging Face ```python # Models model_search(query="", sort="trendingScore", limit=10) model_search(query="image-generation", sort="createdAt", limit=5) model_search(author="meta-llama", sort="downloads") # Papers paper_search(query="multimodal distillation", results_limit=5) paper_search(query="world models", results_limit=3) # Avoid: Broad queries that return 1000s of results # DON'T: paper_search(query="machine learning") # DO: paper_search(query="protein folding transformers") ``` ### Web Search (targeted) ```bash # New releases "AI model released" + site:huggingface.co + after:2025-11-05 # Benchmarks "MMLU benchmark" + "2025" + "state of the art" # Industry applications "your-domain AI" + "specific-application" + after:2025-11-01 ``` ``` ### 4. Database Schema (metadata.db) Create SQLite database with this schema: ```sql CREATE TABLE research_items ( id INTEGER PRIMARY KEY, date TEXT NOT NULL, type TEXT NOT NULL, -- 'model', 'paper', 'tool', 'comparison' name TEXT NOT NULL, source TEXT, -- 'huggingface', 'arxiv', 'github', 'web' url TEXT, summary TEXT, tags TEXT, -- JSON array relevance_score INTEGER, -- 1-5 watch BOOLEAN DEFAULT 0, file_path TEXT ); CREATE INDEX idx_date ON research_items(date); CREATE INDEX idx_type ON research_items(type); CREATE INDEX idx_name ON research_items(name); CREATE INDEX idx_watch ON research_items(watch); CREATE VIRTUAL TABLE research_fts USING fts5( name, summary, tags, content='research_items' ); ``` ### 5. Key Scripts **init_research_day.py:** - Create today's directory structure - Initialize empty markdown files with templates - Create meta.json with date metadata **check_duplicate.py:** ```python # Usage: check_duplicate.py "model:Qwen-Image" # Returns: Found in research/2025-10/2025-10-15/models.md # OR: Not found - safe to research ``` **search_research.py:** ```python # Ripgrep wrapper with filters # Usage: search_research.py "diffusion models" --type=papers --last-days=30 ``` **update_index.py:** ```python # Append to INDEX.md with format: # [2025-11-04] Models: Qwen-Image (image gen), Flux-Kontext (editing) | Papers: Discrete Diffusion review ``` ### 6. Markdown Templates **models.md template:** ```markdown # Models Research - [DATE] ## [Model Name] - **Source**: [HuggingFace/GitHub/Other] - **URL**: [link] - **Type**: [text-gen/image-gen/video/multimodal] - **Parameters**: [size] - **Key Innovation**: [1-2 sentences] - **Performance**: [benchmark results] - **Relevance**: [1-5] - Why this matters - **Tags**: #category #application #domain-specific ### Notes [Detailed analysis, implementation notes, potential use cases] --- ``` **papers.md template:** ```markdown # Papers Research - [DATE] ## [Paper Title] - **Source**: [arXiv/Hugging Face Papers] - **URL**: [link] - **Authors**: [key authors/institutions] - **Key Contribution**: [1-2 sentences] - **Reproducibility**: [code available? data available?] - **Relevance**: [1-5] - Why this matters - **Tags**: #research-area #methodology #application ### Summary [Main findings, methodology, results] ### Implementation Notes [How to use this research, what it enables] --- ``` **tools.md template:** ```markdown # Tools Research - [DATE] ## [Tool/Platform Name] - **Source**: [GitHub/Website] - **URL**: [link] - **Category**: [infrastructure/evaluation/productivity/governance] - **Key Feature**: [What makes it unique] - **Integration**: [How it fits into workflows] - **Relevance**: [1-5] - Why this matters - **Tags**: #tool-category #use-case ### Overview [What it does, who it's for] ### Integration Strategy [How to adopt this tool] --- ``` ### 7. Intelligence Gathering Principles Include in references/search-strategies.md: **The "Already Know" Problem:** - Maintain ARCHIVE.md as source of truth - Before each search, check if item exists - Use SQLite for fast "have we seen this?" queries - Daily: Review last 7 days to avoid re-research **The "Signal vs Noise" Problem:** - Focus on: Novel capabilities, paradigm shifts, direct applicability - Skip: Incremental improvements, duplicative work, low-impact papers - Use relevance scoring (1-5) to filter on retrieval **Domain-Specific Lens** (customize for your field): - Tag items with applicability to your domain - Weekly: Cross-reference with organizational priorities - Monthly: Generate domain-specific summary ## Implementation Instructions 1. **Initialize the skill directory:** ```bash mkdir -p ~/skills/ai-landscaping/{research,scripts,references} cd ~/skills/ai-landscaping ``` 2. **Create all directory structures and files as specified above** 3. **Write all Python scripts with:** - Proper error handling and logging - Clear usage documentation - Type hints and docstrings - Test each script independently 4. **Create SKILL.md** following the pattern above 5. **Create example research entries** for 2-3 days to demonstrate the format 6. **Test the workflow:** - Run `init_research_day.py` to create today's structure - Manually research 3-5 items using templates - Run `update_index.py` to update INDEX.md - Run `check_duplicate.py` to verify duplicate detection - Run `search_research.py` to test retrieval 7. **Symlink to Claude's skills directory** (if applicable) ## Success Criteria - Can research 10-15 high-signal items in 15-20 minutes - Zero duplicate research (ARCHIVE.md prevents this) - Can search across 1000s of documents in <2 seconds - INDEX.md provides quick 30-second overview of all research - Monthly summaries synthesize trends and patterns - User can ask "what did we learn about diffusion models?" and get instant answer ## Example Usage After setup, users should be able to: ``` "Initialize today's research" → Creates directory, templates, ready to go "Check if we've researched Qwen2-VL-72B" → Searches ARCHIVE.md and database "Research trending models from HuggingFace" → Fetches, filters, stores, updates index "Show me all multimodal research from last month" → Runs query, presents results "Generate this month's summary" → Analyzes all research, creates synthesis ``` --- ## Automation: The Claude Code CLI Approach The most powerful way to automate this skill is using **Claude Code CLI** instead of just running Python scripts. ### Recommended Setup Create a task file at `~/skills/ai-landscaping/tasks/daily-research.md`: ```markdown # Daily AI Research Task Execute today's AI landscaping research using the ai-landscaping skill: 1. Check ARCHIVE.md to see what we've already researched 2. Find top 5 trending models from last 24 hours (HuggingFace) 3. Find top 3 papers from major labs (arXiv, HF Papers) 4. Find 1-2 noteworthy tools/platforms 5. For each item: check duplicates, apply quality filters, write research entry 6. Update INDEX.md, ARCHIVE.md, and metadata.db 7. Log any issues or notable findings Be thorough but efficient. Skip low-signal items. Adapt if APIs are slow or unavailable. ``` ### Schedule with LaunchAgent (macOS) ```xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"> <plist version="1.0"> <dict> <key>Label</key> <string>com.example.ai-research</string> <key>ProgramArguments</key> <array> <string>/usr/local/bin/claude-code</string> <string>--task</string> <string>~/skills/ai-landscaping/tasks/daily-research.md</string> </array> <key>StartCalendarInterval</key> <dict> <key>Hour</key> <integer>1</integer> <key>Minute</key> <integer>0</integer> </dict> <key>Nice</key> <integer>10</integer> <key>StandardOutPath</key> <string>~/skills/ai-landscaping/logs/research.log</string> <key>StandardErrorPath</key> <string>~/skills/ai-landscaping/logs/research-error.log</string> </dict> </plist> ``` Or with cron (Linux): ```bash 0 1 * * * /usr/local/bin/claude-code --task ~/skills/ai-landscaping/tasks/daily-research.md >> ~/skills/ai-landscaping/logs/research.log 2>&1 ``` ### Why Claude Code CLI vs Python Scripts? **Claude Code Approach (Recommended):** - ✅ Intelligent decision-making (adapts to trends) - ✅ Graceful error handling (API down? Uses alternatives) - ✅ Dynamic quality filtering (recognizes hype vs substance) - ✅ Better summaries (understands context) - ✅ Uses ALL available tools (MCP servers, web search, file ops) - ✅ Can adjust strategy mid-execution **Script-Only Approach:** - ❌ Rigid, predetermined logic - ❌ Breaks on unexpected conditions - ❌ Limited to what you coded - ❌ Requires manual updates to adapt The skill provides the framework. Claude Code provides the intelligence. ## Customization Notes **For your specific domain:** 1. Update search-strategies.md with your domain's sources 2. Modify relevance scoring criteria 3. Add domain-specific tags 4. Customize quality heuristics 5. Adjust daily routine to your research cadence **For your workflow:** 1. Start manual, validate patterns for 1-2 weeks 2. Create task file with your specific requirements 3. Set quality thresholds based on your time budget 4. Define "high signal" for your use case 5. Schedule Claude Code CLI execution This prompt sets up the complete infrastructure. The Claude Code CLI approach lets you run it with real intelligence, not just automation. --- ### Related Articles - [[building-ai-research-night-shift|My AI Research Assistant Works the Night Shift (A Claude Code Skill Story)]] - [[Knowledge/Blog-Obsidian/Practical Applications/claude-skills-vs-mcp-servers|Claude Skills vs MCP Servers: Why Context Efficiency Matters]] - [[elevating-prompt-engineering-with-integrated-tools|Elevating Prompt Engineering with Integrated Tools]] --- About the Author: Justin Johnson builds AI systems and writes about practical AI development. <a href="https://justinhjohnson.com">justinhjohnson.com</a> | <a href="https://twitter.com/bioinfo">Twitter</a> | <a href="https://www.linkedin.com/in/justinhaywardjohnson/">LinkedIn</a> | <a href="https://rundatarun.io">Run Data Run</a> | <a href="https://subscribe.rundatarun.io">Subscribe</a>