Ingestion Pipeline
Every piece of content you save passes through the same pipeline: capture, queue, classify, extract, create, link, embed. The entire process takes approximately one minute.
Pipeline stages
Section titled “Pipeline stages”1. Capture
Section titled “1. Capture”Content enters LoomBrain through one of four interfaces:
- CLI:
lb capture <url>orlb capture --stdinfor piped text - Chrome Extension: toolbar button or right-click context menu
- MCP: the
lb_capturetool called from Claude - API:
POST /v1/nodesdirectly
All interfaces accept a why note — a short description of your intent. This improves classification accuracy.
2. Queue
Section titled “2. Queue”The capture request is enqueued to Cloudflare Queues. The API returns immediately with a node ID and status: raw. Processing happens asynchronously.
3. Classification
Section titled “3. Classification”The queue worker reads the raw input and calls an AI model to determine the content type. Supported types:
| Type | Description |
|---|---|
article | Blog posts, news articles, editorial content |
tweet | Twitter/X posts and threads |
video | YouTube and other video pages |
repo | GitHub and other code repositories |
pdf | PDF documents |
audio | Podcast episodes and audio files |
note | Plain text or manually written notes |
image | Image files with visual content |
Content type determines which extraction strategy runs next.
4. Extraction
Section titled “4. Extraction”The extractor pulls structured knowledge from the raw content:
- Title — canonical page or document title
- Summary — 2–4 sentence abstract
- Key points — bulleted list of the most important facts
- Body — full markdown representation of the content
For URLs, the original page is fetched and stored in R2 before extraction. This preserves the source even if the page changes later.
5. Node creation
Section titled “5. Node creation”A knowledge node is written to the database with the extracted fields, content type, and status: raw. The node is immediately queryable by ID.
6. Linking
Section titled “6. Linking”The AI compares the new node against existing nodes in your graph using embeddings and metadata. When it detects a meaningful relationship, it creates a bidirectional link between the two nodes. Links are weighted by confidence.
7. Embedding
Section titled “7. Embedding”A vector embedding is generated from the node’s title, summary, and key points. This embedding powers semantic and hybrid search.
Once embeddings are written, the node becomes fully searchable across all three search modes.
Content sources
Section titled “Content sources”The pipeline accepts four input types regardless of interface:
- URLs — any publicly accessible web address
- Files — PDF, images, Word/Google Docs exports
- Plain text / notes — freeform text, code snippets, meeting notes
- stdin — piped input via
lb capture --stdin
Processing time
Section titled “Processing time”End-to-end processing takes approximately one minute under normal conditions. Heavy AI load or large documents (long PDFs, full-length video transcripts) may take longer.
You can check processing status with lb status <node-id> or by watching the node in the dashboard.