Skip to content

Integration Architecture

Vitamem is a library, not a hosted service. It runs inside your application process, calls the LLM and storage backends you configure, and returns results synchronously. There is no background daemon, no webhook receiver, and no separate infrastructure to manage.

This guide covers where Vitamem fits in a typical stack, how to drive the thread lifecycle, and how to architect multi-session memory for production.

┌─────────────────────────────────────────────────┐
│ Your Application (API server, chatbot, etc.) │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ Vitamem (library) │ │
│ │ │ │
│ │ createThread ─ chat ─ retrieve │ │
│ │ sweepThreads ─ triggerDormantTransition │ │
│ │ deleteMemory ─ deleteUserData │ │
│ └─────────┬────────────────┬────────────────┘ │
│ │ │ │
│ LLM Adapter Storage Adapter │
│ │ │ │
└─────────────┼────────────────┼───────────────────┘
│ │
┌────▼────┐ ┌──────▼──────┐
│ OpenAI │ │ Supabase │
│ Claude │ │ SQLite │
│ Ollama │ │ Ephemeral │
└─────────┘ └─────────────┘

Key points:

  • No timers run inside Vitamem. Your application is responsible for calling sweepThreads() on a schedule.
  • No network listener. Vitamem does not open ports or accept inbound connections.
  • No global state. Each createVitamem() call returns an independent instance. You can run multiple instances with different configurations in the same process.

There are three patterns for moving threads through their lifecycle: explicit, automatic, and hybrid. Choose based on your application’s architecture.

Call triggerDormantTransition() when your application knows a session has ended (user logs out, navigates away, closes a chat window).

// User explicitly ends the session
app.post("/api/session/end", async (req, res) => {
await mem.triggerDormantTransition(req.body.threadId);
res.json({ ok: true });
});

This immediately transitions the thread through active -> cooling -> dormant, runs the embedding pipeline, and saves extracted memories. It does not wait for any timeout.

Best for: applications where you have clear session boundaries (mobile apps, scheduled appointments, chat windows with a “close” button).

Call sweepThreads() on a recurring schedule. It scans all threads and applies transitions based on configured timeouts.

import cron from "node-cron";
// Run every 15 minutes
cron.schedule("*/15 * * * *", async () => {
await mem.sweepThreads();
});

What sweepThreads() does on each call:

  1. Active -> Cooling: Threads with no messages for coolingTimeoutMs (default: 6 hours) transition to cooling.
  2. Cooling -> Dormant: Threads that have been cooling for coolingTimeoutMs transition to dormant. The embedding pipeline runs for each.
  3. Dormant -> Closed: Threads that have been dormant for closedTimeoutMs (default: 30 days) transition to closed.

Best for: server-side applications that run continuously (API servers, background workers).

Combine both approaches for the most robust behavior:

// Explicit: when user ends a session
app.post("/api/session/end", async (req, res) => {
await mem.triggerDormantTransition(req.body.threadId);
res.json({ ok: true });
});
// Automatic: catch anything that was missed
cron.schedule("*/15 * * * *", async () => {
await mem.sweepThreads();
});

This way, sessions that end cleanly get immediate memory extraction, while abandoned sessions (browser closed, connection dropped) are cleaned up by the sweep.

When a user returns for a new session, you need to inject their memories into the conversation context. There are two approaches.

Enable autoRetrieve: true in your config. On every chat() call, Vitamem will embed the user’s message, search for relevant memories, and inject them as a system message before sending to the LLM.

const mem = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY,
storage: "ephemeral",
autoRetrieve: true,
});
// Memories are automatically injected -- no extra code needed
const { reply, memories } = await mem.chat({
threadId: thread.id,
message: "How should I adjust my diet?",
});
// `memories` contains what was injected, for transparency
console.log("Injected memories:", memories);

See the auto-retrieve concept doc for details.

Call retrieve() yourself and build a custom system prompt. This gives you full control over formatting, filtering, and what context the LLM sees.

const memories = await mem.retrieve({
userId,
query: "health conditions medications",
limit: 10,
});
const memoryContext = memories
.map((m) => `- ${m.content} (${m.source})`)
.join("\n");
const { reply } = await mem.chat({
threadId: thread.id,
message: userMessage,
systemPrompt: `You are a health companion. Context from past sessions:\n${memoryContext}`,
});

Vitamem ties threads together through the userId field. One user can have many threads, and memories from all threads are pooled together. When you call retrieve(), it searches across all of a user’s memories regardless of which thread they came from.

User "user-456"
├── Thread A (closed) ── memories: diabetes, metformin
├── Thread B (closed) ── memories: exercise routine, sleep issues
├── Thread C (dormant) ── memories: started physical therapy
└── Thread D (active) ── current session

When Thread D calls retrieve(), it searches across memories from Threads A, B, and C.

// Each session creates a new thread for the same user
const thread = await mem.createThread({ userId: "user-456" });
// All past memories are available for retrieval
const memories = await mem.retrieve({
userId: "user-456",
query: "current medications",
});

For production, use SupabaseAdapter for durable storage with pgvector-powered semantic search.

const mem = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY,
storage: "supabase",
supabaseUrl: process.env.SUPABASE_URL,
supabaseKey: process.env.SUPABASE_SERVICE_ROLE_KEY,
coolingTimeoutMs: 2 * 60 * 60 * 1000, // 2 hours
closedTimeoutMs: 90 * 24 * 60 * 60 * 1000, // 90 days
embeddingConcurrency: 3, // conservative for rate limits
autoRetrieve: true,
});

The Supabase adapter expects three tables (threads, messages, memories) and an optional match_memories RPC function for server-side vector search. If the RPC is not available, it falls back to client-side cosine similarity.

create table threads (
id uuid primary key,
user_id text not null,
state text not null default 'active',
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
last_message_at timestamptz,
cooling_started_at timestamptz,
dormant_at timestamptz,
closed_at timestamptz
);
create table messages (
id uuid primary key,
thread_id uuid references threads(id),
role text not null,
content text not null,
created_at timestamptz not null default now()
);
-- Requires pgvector extension
create extension if not exists vector;
create table memories (
id uuid primary key,
user_id text not null,
thread_id uuid references threads(id),
content text not null,
source text not null,
embedding vector(1536),
created_at timestamptz not null default now()
);
-- Optional: server-side vector search for better performance
create or replace function match_memories(
query_embedding vector(1536),
match_user_id text,
match_limit int default 10
) returns table (content text, source text, similarity float) as $$
select
m.content,
m.source,
1 - (m.embedding <=> query_embedding) as similarity
from memories m
where m.user_id = match_user_id
and m.embedding is not null
order by m.embedding <=> query_embedding
limit match_limit;
$$ language sql stable;
.env
OPENAI_API_KEY=sk-...
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=eyJ...