Jeremy
API Reference

Crawl

Crawl a website and ingest its documentation.

POST /api/crawl

Crawl one or more URLs using Puppeteer (Cloudflare Browser Rendering), extract content, chunk it, and ingest it as a library. When a single URL is provided, the crawler auto-discovers links from that page (up to 150 pages).

Content is chunked into approximately 500-word segments with 50-word overlap. Embeddings are auto-generated for libraries with 500 or fewer chunks.

Auth: admin API key or session

Request Body

{
  "libraryId": "tailwind",
  "name": "Tailwind CSS",
  "description": "Utility-first CSS framework",
  "urls": ["https://tailwindcss.com/docs"],
  "replace": true
}
FieldTypeRequiredDescription
libraryIdstringYesUnique identifier for the library
namestringYesDisplay name
descriptionstringNoLibrary description
urlsstring[]YesOne or more seed URLs to crawl
replacebooleanNoIf true, deletes existing chunks before inserting new ones

Response

{
  "success": true,
  "libraryId": "tailwind",
  "pagesDiscovered": 48,
  "pagesCrawled": 45,
  "chunksIngested": 312,
  "vectorized": true,
  "errors": [
    "https://tailwindcss.com/broken-page: Navigation timeout"
  ]
}
FieldTypeDescription
successbooleanWhether the crawl completed
libraryIdstringThe library ID
pagesDiscoverednumberTotal pages found during link discovery
pagesCrawlednumberPages successfully crawled
chunksIngestednumberTotal chunks stored
vectorizedbooleanWhether embeddings were generated
errorsstring[]Optional. Per-page errors encountered during crawling

Example

curl -X POST https://jeremy-app.ian-muench.workers.dev/api/crawl \
  -H "Authorization: Bearer jrmy_your_admin_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "libraryId": "nextjs",
    "name": "Next.js",
    "urls": ["https://nextjs.org/docs"],
    "replace": true
  }'

When you provide a single URL, the crawler:

  1. Navigates to the page and waits for it to fully render
  2. Extracts all same-origin links (excluding anchors, images, and assets)
  3. Crawls up to 150 discovered pages
  4. Strips navigation, footers, sidebars, and other non-content elements

If you provide multiple URLs, each URL is crawled directly without discovery.

Errors

StatusDescription
400Missing required fields, or no content could be extracted
401Missing or insufficient auth (requires admin API key or session)
500Internal crawl error