Web Search
Why. Web search via DuckDuckGo metasearch engine — returns ranked results with titles, URLs, snippets. Designed to discover URLs for follow-up crawling with web_crawl. Result cache with 5-minute TTL.
How it works. Registers web_search tool. On first call, auto-creates .pi/web-search-venv/ and installs ddgs from requirements.txt. Each call writes Python script to per-call isolated temp directory (ignore/web-search/search-<random>/) — prevents file races under concurrent calls. Executes via pi.exec bash subprocess. Results parsed from SEARCH_OK/SEARCH_DONE delimiters. Cached in memory (5-min TTL). Errors propagate as thrown exceptions for proper isError signaling via the framework. SIGTERM handled — Python subprocess exits cleanly with code 130 on cancellation.
Location: .pi/extensions/web-search/
Details
Architecture
├── index.ts # Entry: tool registration, cache, concurrency semaphore
├── python-script.ts # Inline Python script (ddgs) as string constant
├── executor.ts # runSearchScript: write temp file, exec subprocess, parse delimited output
├── venv-setup.ts # Auto-create .pi/web-search-venv + pip install ddgs
├── types.ts # SearchCacheEntry, SearchResult types
└── test/ # Executor + parser tests
Execution Flow
flowchart LR
A[tool_call: web_search] --> B[Acquire semaphore: max 5 concurrent]
B --> C{Cache hit?}
C -- yes, within 5min TTL --> D[Return cached results]
C -- miss --> E[ensureWebSearchVenv]
E --> F[write Python script to temp dir]
F --> G[exec python3 script]
G --> H[parseSearchResults: SEARCH_OK / SEARCH_DONE delimiters]
H --> I[cache in memory Map]
I --> J[formatResults: title + URL + snippet]
J --> K[Release semaphore, return]
Key Design Decisions
- Per-call isolated temp directory — Each search writes its Python script to
ignore/web-search/search-<random>/. Prevents file races under concurrent calls. Cleanup via SIGTERM handler when tool is cancelled. - Concurrency semaphore — Max 5 simultaneous searches. Prevents overwhelming DDGS API rate limits and the venv lock file.
- Cache TTL: 5 minutes — In-memory
Map<string, SearchCacheEntry>with timestamp-aware expiry. Stale entries pruned on each set. Cleared only on session boundaries. - Python subprocess with
ddgs— Uses the minimalistddgsPython library (DuckDuckGo search, no browser dependency). No Puppeteer/Playwright overhead. Venv auto-created on first call. - Delimiter-based output parsing — Script outputs
SEARCH_OK/SEARCH_DONEmarkers for reliable parsing. Errors captured asSEARCH_ERRORblocks with message. - URL encoding for markdown —
encodeUrl()encodes parentheses()in URLs to prevent markdown link breakage. - SIGTERM handling — Python subprocess exits cleanly with code 130 on cancellation. No orphan processes.
- Max results bounded —
Math.min(Math.max(1, maxResults), 50).
Output Format
Search results:
1. [Title](https://example.com)
Snippet text here...
2. [Next Title](https://example.org)
Snippet text here...
Testing
Tests cover:
- Search result parsing from SEARCH_OK/SEARCH_DONE delimiters
- Error parsing from SEARCH_ERROR blocks
- Empty result sets
- Cache TTL expiry logic
- Venv setup retry and failure modes