Integration Architecture
Integration Architecture
Section titled “Integration Architecture”Vitamem is a library, not a hosted service. It runs inside your application process, calls the LLM and storage backends you configure, and returns results synchronously. There is no background daemon, no webhook receiver, and no separate infrastructure to manage.
This guide covers where Vitamem fits in a typical stack, how to drive the thread lifecycle, and how to architect multi-session memory for production.
Where Vitamem Sits in Your Stack
Section titled “Where Vitamem Sits in Your Stack”┌─────────────────────────────────────────────────┐│ Your Application (API server, chatbot, etc.) ││ ││ ┌───────────────────────────────────────────┐ ││ │ Vitamem (library) │ ││ │ │ ││ │ createThread ─ chat ─ retrieve │ ││ │ sweepThreads ─ triggerDormantTransition │ ││ │ deleteMemory ─ deleteUserData │ ││ └─────────┬────────────────┬────────────────┘ ││ │ │ ││ LLM Adapter Storage Adapter ││ │ │ │└─────────────┼────────────────┼───────────────────┘ │ │ ┌────▼────┐ ┌──────▼──────┐ │ OpenAI │ │ Supabase │ │ Claude │ │ SQLite │ │ Ollama │ │ Ephemeral │ └─────────┘ └─────────────┘Key points:
- No timers run inside Vitamem. Your application is responsible for calling
sweepThreads()on a schedule. - No network listener. Vitamem does not open ports or accept inbound connections.
- No global state. Each
createVitamem()call returns an independent instance. You can run multiple instances with different configurations in the same process.
Triggering Lifecycle Transitions
Section titled “Triggering Lifecycle Transitions”There are three patterns for moving threads through their lifecycle: explicit, automatic, and hybrid. Choose based on your application’s architecture.
Explicit Transitions
Section titled “Explicit Transitions”Call triggerDormantTransition() when your application knows a session has ended (user logs out, navigates away, closes a chat window).
// User explicitly ends the sessionapp.post("/api/session/end", async (req, res) => { await mem.triggerDormantTransition(req.body.threadId); res.json({ ok: true });});This immediately transitions the thread through active -> cooling -> dormant, runs the embedding pipeline, and saves extracted memories. It does not wait for any timeout.
Best for: applications where you have clear session boundaries (mobile apps, scheduled appointments, chat windows with a “close” button).
Automatic Transitions with sweepThreads()
Section titled “Automatic Transitions with sweepThreads()”Call sweepThreads() on a recurring schedule. It scans all threads and applies transitions based on configured timeouts.
import cron from "node-cron";
// Run every 15 minutescron.schedule("*/15 * * * *", async () => { await mem.sweepThreads();});What sweepThreads() does on each call:
- Active -> Cooling: Threads with no messages for
coolingTimeoutMs(default: 6 hours) transition to cooling. - Cooling -> Dormant: Threads that have been cooling for
coolingTimeoutMstransition to dormant. The embedding pipeline runs for each. - Dormant -> Closed: Threads that have been dormant for
closedTimeoutMs(default: 30 days) transition to closed.
Best for: server-side applications that run continuously (API servers, background workers).
Hybrid (Recommended for Production)
Section titled “Hybrid (Recommended for Production)”Combine both approaches for the most robust behavior:
// Explicit: when user ends a sessionapp.post("/api/session/end", async (req, res) => { await mem.triggerDormantTransition(req.body.threadId); res.json({ ok: true });});
// Automatic: catch anything that was missedcron.schedule("*/15 * * * *", async () => { await mem.sweepThreads();});This way, sessions that end cleanly get immediate memory extraction, while abandoned sessions (browser closed, connection dropped) are cleaned up by the sweep.
Memory Injection Patterns
Section titled “Memory Injection Patterns”When a user returns for a new session, you need to inject their memories into the conversation context. There are two approaches.
Automatic with autoRetrieve
Section titled “Automatic with autoRetrieve”Enable autoRetrieve: true in your config. On every chat() call, Vitamem will embed the user’s message, search for relevant memories, and inject them as a system message before sending to the LLM.
const mem = await createVitamem({ provider: "openai", apiKey: process.env.OPENAI_API_KEY, storage: "ephemeral", autoRetrieve: true,});
// Memories are automatically injected -- no extra code neededconst { reply, memories } = await mem.chat({ threadId: thread.id, message: "How should I adjust my diet?",});
// `memories` contains what was injected, for transparencyconsole.log("Injected memories:", memories);See the auto-retrieve concept doc for details.
Manual with retrieve() + systemPrompt
Section titled “Manual with retrieve() + systemPrompt”Call retrieve() yourself and build a custom system prompt. This gives you full control over formatting, filtering, and what context the LLM sees.
const memories = await mem.retrieve({ userId, query: "health conditions medications", limit: 10,});
const memoryContext = memories .map((m) => `- ${m.content} (${m.source})`) .join("\n");
const { reply } = await mem.chat({ threadId: thread.id, message: userMessage, systemPrompt: `You are a health companion. Context from past sessions:\n${memoryContext}`,});Multi-Session Architecture
Section titled “Multi-Session Architecture”Vitamem ties threads together through the userId field. One user can have many threads, and memories from all threads are pooled together. When you call retrieve(), it searches across all of a user’s memories regardless of which thread they came from.
User "user-456" ├── Thread A (closed) ── memories: diabetes, metformin ├── Thread B (closed) ── memories: exercise routine, sleep issues ├── Thread C (dormant) ── memories: started physical therapy └── Thread D (active) ── current sessionWhen Thread D calls retrieve(), it searches across memories from Threads A, B, and C.
// Each session creates a new thread for the same userconst thread = await mem.createThread({ userId: "user-456" });
// All past memories are available for retrievalconst memories = await mem.retrieve({ userId: "user-456", query: "current medications",});Production Deployment with Supabase
Section titled “Production Deployment with Supabase”For production, use SupabaseAdapter for durable storage with pgvector-powered semantic search.
const mem = await createVitamem({ provider: "openai", apiKey: process.env.OPENAI_API_KEY, storage: "supabase", supabaseUrl: process.env.SUPABASE_URL, supabaseKey: process.env.SUPABASE_SERVICE_ROLE_KEY, coolingTimeoutMs: 2 * 60 * 60 * 1000, // 2 hours closedTimeoutMs: 90 * 24 * 60 * 60 * 1000, // 90 days embeddingConcurrency: 3, // conservative for rate limits autoRetrieve: true,});The Supabase adapter expects three tables (threads, messages, memories) and an optional match_memories RPC function for server-side vector search. If the RPC is not available, it falls back to client-side cosine similarity.
Supabase Schema (Required Tables)
Section titled “Supabase Schema (Required Tables)”create table threads ( id uuid primary key, user_id text not null, state text not null default 'active', created_at timestamptz not null default now(), updated_at timestamptz not null default now(), last_message_at timestamptz, cooling_started_at timestamptz, dormant_at timestamptz, closed_at timestamptz);
create table messages ( id uuid primary key, thread_id uuid references threads(id), role text not null, content text not null, created_at timestamptz not null default now());
-- Requires pgvector extensioncreate extension if not exists vector;
create table memories ( id uuid primary key, user_id text not null, thread_id uuid references threads(id), content text not null, source text not null, embedding vector(1536), created_at timestamptz not null default now());
-- Optional: server-side vector search for better performancecreate or replace function match_memories( query_embedding vector(1536), match_user_id text, match_limit int default 10) returns table (content text, source text, similarity float) as $$ select m.content, m.source, 1 - (m.embedding <=> query_embedding) as similarity from memories m where m.user_id = match_user_id and m.embedding is not null order by m.embedding <=> query_embedding limit match_limit;$$ language sql stable;Environment Configuration
Section titled “Environment Configuration”OPENAI_API_KEY=sk-...SUPABASE_URL=https://your-project.supabase.coSUPABASE_SERVICE_ROLE_KEY=eyJ...