Skip to content

Ingestion Pipeline

Every piece of content you save passes through the same pipeline: capture, queue, classify, extract, create, link, embed. The entire process takes approximately one minute.

Content enters LoomBrain through one of four interfaces:

  • CLI: lb capture <url> or lb capture --stdin for piped text
  • Chrome Extension: toolbar button or right-click context menu
  • MCP: the lb_capture tool called from Claude
  • API: POST /v1/nodes directly

All interfaces accept a why note — a short description of your intent. This improves classification accuracy.

The capture request is enqueued to Cloudflare Queues. The API returns immediately with a node ID and status: raw. Processing happens asynchronously.

The queue worker reads the raw input and calls an AI model to determine the content type. Supported types:

TypeDescription
articleBlog posts, news articles, editorial content
tweetTwitter/X posts and threads
videoYouTube and other video pages
repoGitHub and other code repositories
pdfPDF documents
audioPodcast episodes and audio files
notePlain text or manually written notes
imageImage files with visual content

Content type determines which extraction strategy runs next.

The extractor pulls structured knowledge from the raw content:

  • Title — canonical page or document title
  • Summary — 2–4 sentence abstract
  • Key points — bulleted list of the most important facts
  • Body — full markdown representation of the content

For URLs, the original page is fetched and stored in R2 before extraction. This preserves the source even if the page changes later.

A knowledge node is written to the database with the extracted fields, content type, and status: raw. The node is immediately queryable by ID.

The AI compares the new node against existing nodes in your graph using embeddings and metadata. When it detects a meaningful relationship, it creates a bidirectional link between the two nodes. Links are weighted by confidence.

A vector embedding is generated from the node’s title, summary, and key points. This embedding powers semantic and hybrid search.

Once embeddings are written, the node becomes fully searchable across all three search modes.

The pipeline accepts four input types regardless of interface:

  • URLs — any publicly accessible web address
  • Files — PDF, images, Word/Google Docs exports
  • Plain text / notes — freeform text, code snippets, meeting notes
  • stdin — piped input via lb capture --stdin

End-to-end processing takes approximately one minute under normal conditions. Heavy AI load or large documents (long PDFs, full-length video transcripts) may take longer.

You can check processing status with lb status <node-id> or by watching the node in the dashboard.