API Reference
Crawl
Crawl a website and ingest its documentation.
POST /api/crawl
Crawl one or more URLs using Puppeteer (Cloudflare Browser Rendering), extract content, chunk it, and ingest it as a library. When a single URL is provided, the crawler auto-discovers links from that page (up to 150 pages).
Content is chunked into approximately 500-word segments with 50-word overlap. Embeddings are auto-generated for libraries with 500 or fewer chunks.
Auth: admin API key or session
Request Body
{
"libraryId": "tailwind",
"name": "Tailwind CSS",
"description": "Utility-first CSS framework",
"urls": ["https://tailwindcss.com/docs"],
"replace": true
}| Field | Type | Required | Description |
|---|---|---|---|
libraryId | string | Yes | Unique identifier for the library |
name | string | Yes | Display name |
description | string | No | Library description |
urls | string[] | Yes | One or more seed URLs to crawl |
replace | boolean | No | If true, deletes existing chunks before inserting new ones |
Response
{
"success": true,
"libraryId": "tailwind",
"pagesDiscovered": 48,
"pagesCrawled": 45,
"chunksIngested": 312,
"vectorized": true,
"errors": [
"https://tailwindcss.com/broken-page: Navigation timeout"
]
}| Field | Type | Description |
|---|---|---|
success | boolean | Whether the crawl completed |
libraryId | string | The library ID |
pagesDiscovered | number | Total pages found during link discovery |
pagesCrawled | number | Pages successfully crawled |
chunksIngested | number | Total chunks stored |
vectorized | boolean | Whether embeddings were generated |
errors | string[] | Optional. Per-page errors encountered during crawling |
Example
curl -X POST https://jeremy-app.ian-muench.workers.dev/api/crawl \
-H "Authorization: Bearer jrmy_your_admin_key_here" \
-H "Content-Type: application/json" \
-d '{
"libraryId": "nextjs",
"name": "Next.js",
"urls": ["https://nextjs.org/docs"],
"replace": true
}'Link Discovery
When you provide a single URL, the crawler:
- Navigates to the page and waits for it to fully render
- Extracts all same-origin links (excluding anchors, images, and assets)
- Crawls up to 150 discovered pages
- Strips navigation, footers, sidebars, and other non-content elements
If you provide multiple URLs, each URL is crawled directly without discovery.
Errors
| Status | Description |
|---|---|
| 400 | Missing required fields, or no content could be extracted |
| 401 | Missing or insufficient auth (requires admin API key or session) |
| 500 | Internal crawl error |