tl;dr Brainery gets data from sources like Discord and GitHub into a Landing Zone, processes it via MCP and LLMs, and stores it in the TimescaleDB observation_log. MCP also lets you query this data easily.
This guide explains how the Brainery system processes and queries data across multiple sources. We'll explore two main flows: data ingestion and query processing. The system leverages Model Context Protocol (MCP) for structured data handling and TimescaleDB for efficient time-series storage.
Data ingestion flow
The ingestion pipeline processes data from multiple sources into a structured, queryable format. Here's how it works:
sequenceDiagram
participant D as Discord (#tech)
participant M as Memo Blog
participant G as GitHub
participant LZ as Landing Zone (GCP S3)
participant BI as Background Interface (LLM)
participant MCP as MCP Server
participant TS as TimescaleDB
D->>LZ: Posts message in #tech (e.g., "Devs using AI to code")
M->>LZ: Logs user action (e.g., "0x1234 subscribed")
G->>LZ: Records commit (e.g., "AI-generated code added")
Note over LZ: Raw data stored as JSON/CSV in GCP S3 buckets
LZ->>BI: Triggers batch or stream processing
BI->>MCP: Sends MCP request (e.g., "parse_and_store", raw data)
Note over MCP: Executes function: parses data into payload
MCP->>TS: Inserts into observation_log (append-only)
TS-->>MCP: Confirms insertion
MCP-->>BI: Returns success response
System components
graph TD
D[Discord<br>#tech] -->|Messages| LZ[Landing Zone<br>GCP S3]
M[Memo Blog<br>subscribe, mint, etc.] -->|User Actions| LZ
G[GitHub<br>Commits, Issues] -->|Repo Activity| LZ
LZ -->|Raw Parquet| BI[Background Interface<br>LLM]
BI -->|MCP Request: parse_and_store| MCP[MCP Server<br>Model Context Protocol]
MCP -->|Structured Payload| TS[TimescaleDB<br>observation_log]
subgraph Data Sources
D
M
G
end
subgraph Ingestion Pipeline
LZ
BI
MCP
TS
end
How it works
-
Data collection: Raw data flows into the system from three primary sources:
- Discord: Technical discussions and insights from #tech channel
- Memo Blog: User actions like subscriptions and content interactions
- GitHub: Repository activities including commits and issues
-
Landing zone: All raw data is initially stored in GCP S3 buckets in JSON/CSV format, acting as a reliable buffer for incoming data.
-
Processing pipeline:
- The Background Interface (an LLM instance) monitors the landing zone
- When new data arrives, it triggers processing through MCP requests
- The MCP Server parses raw data into structured payloads
- Data is stored in TimescaleDB's observation_log hypertable
-
Storage: TimescaleDB maintains an append-only observation log, ensuring data integrity and auditability.
Query flow
The query flow enables external services to interact with the stored data through an LLM-powered chatbot interface.
sequenceDiagram
participant C as MCP Client (Chatbot LLM)
participant MCP as MCP Server
participant TS as TimescaleDB
C->>MCP: Sends MCP request (e.g., "query_db", "SELECT * FROM coined_term_trends...")
Note over MCP: Executes function: runs SQL query
MCP->>TS: Queries observation_log/aggregates
TS-->>MCP: Returns results (e.g., "Vibe Coding, mention_count: 20")
MCP-->>C: Delivers raw result set
C->>C: Formats response (e.g., "Vibe Coding is trending...")
Note over C: Chatbot presents response to user
Query architecture
graph TD
C[MCP Client<br>Chatbot LLM] -->|MCP Request: query_db| MCP[MCP Server]
MCP -->|SQL Query| TS[TimescaleDB<br>observation_log + aggregates]
subgraph Query Interface
C
MCP
end
subgraph Database Interaction
TS
end
Query process
-
Request handling:
- External services send natural language queries to the MCP Client (chatbot)
- The LLM interprets the request and generates appropriate SQL queries
-
Query execution:
- The MCP Server receives and validates the query request
- Queries are executed against TimescaleDB's observation_log or aggregates
- Results are returned to the MCP Client
-
Response formatting:
- The LLM formats raw data into natural language responses
- Responses are delivered directly to the requesting service
Key benefits
- Structured data flow: The MCP protocol ensures consistent data handling across the system
- Scalable storage: TimescaleDB's hypertable architecture enables efficient time-series data management
- Intelligent interface: LLM-powered chatbot provides natural language access to complex data
- Reliable processing: Append-only logs maintain data integrity and auditability
Implementation notes
- The Background Interface operates as an LLM that uses MCP for data ingestion
- The MCP Server acts as a protocol layer between LLMs and TimescaleDB
- The system maintains append-only logs for data integrity
- All data transformations are handled through MCP requests for consistency
Next: Promote data to insight