Ingestion Pipeline

Every piece of content you save passes through the same pipeline: capture, queue, classify, extract, create, link, embed. The entire process takes approximately one minute.

Pipeline stages

1. Capture

Content enters LoomBrain through one of four interfaces:

CLI: lb capture <url> or lb capture --stdin for piped text
Chrome Extension: toolbar button or right-click context menu
MCP: the lb_capture tool called from Claude
API: POST /v1/nodes directly

All interfaces accept a why note — a short description of your intent. This improves classification accuracy.

2. Queue

The capture request is enqueued to Cloudflare Queues. The API returns immediately with a node ID and status: raw. Processing happens asynchronously.

3. Classification

The queue worker reads the raw input and calls an AI model to determine the content type. Supported types:

Type	Description
`article`	Blog posts, news articles, editorial content
`tweet`	Twitter/X posts and threads
`video`	YouTube and other video pages
`repo`	GitHub and other code repositories
`pdf`	PDF documents
`audio`	Podcast episodes and audio files
`note`	Plain text or manually written notes
`image`	Image files with visual content

Content type determines which extraction strategy runs next.

4. Extraction

The extractor pulls structured knowledge from the raw content:

Title — canonical page or document title
Summary — 2–4 sentence abstract
Key points — bulleted list of the most important facts
Body — full markdown representation of the content

For URLs, the original page is fetched and stored in R2 before extraction. This preserves the source even if the page changes later.

5. Node creation

A knowledge node is written to the database with the extracted fields, content type, and status: raw. The node is immediately queryable by ID.

6. Linking

The AI compares the new node against existing nodes in your graph using embeddings and metadata. When it detects a meaningful relationship, it creates a bidirectional link between the two nodes. Links are weighted by confidence.

7. Embedding

A vector embedding is generated from the node’s title, summary, and key points. This embedding powers semantic and hybrid search.

Once embeddings are written, the node becomes fully searchable across all three search modes.

Content sources

The pipeline accepts four input types regardless of interface:

URLs — any publicly accessible web address
Files — PDF, images, Word/Google Docs exports
Plain text / notes — freeform text, code snippets, meeting notes
stdin — piped input via lb capture --stdin

Processing time

End-to-end processing takes approximately one minute under normal conditions. Heavy AI load or large documents (long PDFs, full-length video transcripts) may take longer.

You can check processing status with lb status <node-id> or by watching the node in the dashboard.