Skip to content

createVitamem

The main factory function. Takes a single config object and returns a Vitamem instance.

async function createVitamem(config: VitamemConfig): Promise<Vitamem>;

Provide either a provider string shortcut or an llm adapter instance.

FieldTypeDefaultDescription
provider"openai" | "anthropic" | "ollama"Creates a built-in adapter (requires apiKey for cloud providers)
apiKeystringAPI key for the provider
modelstringProvider defaultChat model override
extractionModelstringSame as modelModel for memory extraction. Use a cheaper/faster model for extraction while keeping a more capable chat model.
embeddingModelstringProvider defaultEmbedding model override
baseUrlstringProvider defaultAPI base URL (for proxies, self-hosted)
llmLLMAdapterCustom adapter instance (overrides provider)

Provider defaults:

ProviderChat ModelEmbedding Model
openaigpt-5.4-minitext-embedding-3-small
anthropicclaude-sonnet-4-20250514text-embedding-3-small (via OpenAI)
ollamallama3.2nomic-embed-text

Provide either a string shortcut or a StorageAdapter instance.

FieldTypeDefaultDescription
storage"ephemeral" | "supabase" | StorageAdapterrequiredStorage backend
supabaseUrlstringRequired when storage: "supabase"
supabaseKeystringRequired when storage: "supabase"
FieldTypeDefaultDescription
presetPresetNameNamed timeout preset ("daily-checkin", "weekly-therapy", "on-demand", "long-term"). Explicit timeout values override preset values.
coolingTimeoutMsnumber21600000 (6h)Inactivity before active → cooling in sweepThreads()
dormantTimeoutMsnumberSame as coolingTimeoutMsTime in cooling before cooling → dormant in sweepThreads()
closedTimeoutMsnumber2592000000 (30d)Time in dormant before auto-close in sweepThreads()
embeddingConcurrencynumber5Max concurrent embedding API calls
autoRetrievebooleanfalseInject relevant memories into every chat() call
structuredExtractionRulesStructuredExtractionRule[]Rules for classifying extracted facts into structured profile fields. Use HEALTH_STRUCTURED_RULES for health domains.
FieldTypeDefaultDescription
onRetrieve(memories, query) => MemoryMatch[]Hook to filter or reorder memories after the retrieval pipeline runs
minScorenumber0Minimum cosine similarity score for retrieved memories (0 = no filtering)
recencyWeightnumber0Blend factor (0—1) between cosine similarity and recency. 0 = pure cosine, 1 = pure recency.
recencyMaxAgeMsnumber7776000000 (90d)Normalization window for recency scoring
diversityWeightnumber0MMR diversity weight (0—1). 0 = standard top-K, higher values promote diversity.
FieldTypeDefaultDescription
extractionPromptstringTop-level extraction prompt override. Forwarded to the adapter when using the provider shortcut. Overrides the adapter’s default health-focused prompt.
memoryContextFormatter(memories: MemoryMatch[], query: string) => stringCustom formatter for auto-retrieve memory injection. Replaces the default bullet-point format.
deduplicationThresholdnumber0.92Cosine similarity threshold for exact duplicate detection. Facts with similarity above this are discarded.
supersedeThresholdnumber0.75Cosine similarity threshold for memory supersede. Facts with similarity between this and deduplicationThreshold update the existing memory in-place (e.g., A1C 7.4% → 6.8%).
autoPinRulesAutoPinRule[]Rules that automatically pin critical memories during extraction. Use the built-in HEALTH_AUTO_PIN_RULES for health domains.
forgettingForgettingConfigundefinedEnable active forgetting with decay model. If not set, decay is disabled.
forgetting.forgettingHalfLifeMsnumber15552000000 (180 days)Time in ms until unretrieved memory relevance halves
forgetting.minRetrievalScorenumber0.1Score threshold below which memories are archival candidates
enableReflectionbooleanfalseEnable a second LLM call to validate and enrich extracted facts
reflectionPromptstringbuilt-inCustom prompt override for the reflection LLM call
prioritySignalingbooleantruePrepend priority markers ([CRITICAL], [IMPORTANT], [INFO]) to each memory line based on source and pinned status
chronologicalRetrievalbooleantrueSort retrieved memories by createdAt and group by month/year with date headers
cacheableContextbooleanfalseSplit memory context into stable prefix (profile + pinned) and dynamic suffix (retrieved) for LLM caching
const vm = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
storage: "ephemeral",
autoRetrieve: true,
memoryContextFormatter: (memories, query) =>
`Known facts about this user:\n${memories.map(m => `${m.content}`).join("\n")}`,
});
import { createVitamem, HEALTH_AUTO_PIN_RULES } from "vitamem";
const vm = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
storage: "ephemeral",
autoPinRules: HEALTH_AUTO_PIN_RULES,
});
// "Allergic to penicillin" → automatically pinned
// "Blood type A+" → automatically pinned
// Minimal — string shortcuts
const mem = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
storage: "ephemeral",
});
// Local models
const mem = await createVitamem({
provider: "ollama",
storage: "ephemeral",
});
// Custom adapter + Supabase
const mem = await createVitamem({
llm: myCustomAdapter,
storage: "supabase",
supabaseUrl: process.env.SUPABASE_URL!,
supabaseKey: process.env.SUPABASE_KEY!,
});
// Full config
const mem = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-4o",
storage: "supabase",
supabaseUrl: process.env.SUPABASE_URL!,
supabaseKey: process.env.SUPABASE_KEY!,
coolingTimeoutMs: 6 * 60 * 60 * 1000,
closedTimeoutMs: 30 * 24 * 60 * 60 * 1000,
embeddingConcurrency: 10,
autoRetrieve: true,
});

A Vitamem instance with the following methods.


Creates a new conversation thread in the active state.

const thread = await mem.createThread({ userId: "user-123" });
// thread.state === 'active'

Sends a message in a thread and returns the AI reply. Automatically reactivates cooling threads.

const { reply, thread, memories } = await mem.chat({
threadId: thread.id,
message: "I take metformin daily for my diabetes.",
systemPrompt: "You are a health companion.", // optional
});
OptionTypeDescription
threadIdstringThe thread to send the message in
messagestringThe user’s message
systemPromptstring?Optional system prompt prepended to context

Returns: { reply: string, thread: Thread, memories?: MemoryMatch[], previousThreadId?: string, redirected?: boolean }

When autoRetrieve is enabled, memories contains the memories that were injected into context. If the thread was dormant or closed, a new thread is created and redirected is true with previousThreadId set.


Streaming variant of chat(). Returns an AsyncGenerator that yields response tokens as they are generated. The full reply is saved to storage after the stream completes.

const { stream, thread, memories } = await mem.chatStream({
threadId: thread.id,
message: "What medications am I taking?",
systemPrompt: "You are a health companion.", // optional
});
for await (const chunk of stream) {
process.stdout.write(chunk);
}
OptionTypeDescription
threadIdstringThe thread to send the message in
messagestringThe user’s message
systemPromptstring?Optional system prompt prepended to context

Returns: Promise<{ stream, thread, memories?, previousThreadId?, redirected? }>

FieldTypeDescription
streamAsyncGenerator<string>Yields response tokens one-by-one
threadThreadThe resolved thread (may be a new thread if redirected)
memoriesMemoryMatch[]?Memories injected into context (when autoRetrieve is enabled)
previousThreadIdstring?Original thread ID if redirected from dormant/closed
redirectedboolean?true if a new thread was created due to dormant/closed state

Convenience method that resolves or creates a thread for the user, then calls chat().

const { reply, thread } = await mem.chatWithUser({
userId: "user-123",
message: "How has my blood pressure been?",
});

Streaming variant of chatWithUser(). Resolves or creates a thread, then calls chatStream().

const { stream, thread } = await mem.chatWithUserStream({
userId: "user-123",
message: "How has my blood pressure been?",
});
for await (const chunk of stream) {
process.stdout.write(chunk);
}

Returns: Same shape as chatStream().


Searches a user’s stored memories using semantic similarity.

const memories = await mem.retrieve({
userId: "user-123",
query: "medications and health conditions",
limit: 5,
});
// Returns MemoryMatch[] sorted by similarity score descending

Pins a memory so it is always included in retrieval results (score 1.0) and exempt from active forgetting decay.

await mem.pinMemory("memory-abc");

Removes the pinned status from a memory.

await mem.unpinMemory("memory-abc");

Returns the user’s latest active or cooling thread, or creates a new one if none exists.

const thread = await mem.getOrCreateThread("user-123");
// thread.state === 'active' or 'cooling'

Returns the current thread object, or null if not found.

const thread = await mem.getThread("thread-abc");
console.log(thread?.state); // 'active' | 'cooling' | 'dormant' | 'closed'

Transitions a thread to dormant (via cooling if currently active) and runs the embedding pipeline (extract, reflect, classify, embed, deduplicate, save).

const stats = await mem.triggerDormantTransition(thread.id);
console.log(`Saved: ${stats.memoriesSaved}, Superseded: ${stats.memoriesSuperseded}, Deduped: ${stats.memoriesDeduped}`);

Returns: Promise<{ memoriesSaved, memoriesDeduped, memoriesSuperseded, totalExtracted, profileFieldsUpdated }>


Archives a thread. Only valid from dormant state. Memories remain searchable.

await mem.closeThread(thread.id);
// Thread is now 'closed'

Checks all threads and applies lifecycle transitions based on configured timeouts. Call this on a schedule (e.g., setInterval, cron).

// Check every minute
setInterval(() => mem.sweepThreads(), 60_000);

Transitions applied:

  • active → cooling if no messages for coolingTimeoutMs
  • cooling → dormant if cooling for dormantTimeoutMs (runs embedding pipeline)
  • dormant → closed if dormant for closedTimeoutMs

Deletes a single memory by ID.

await mem.deleteMemory("memory-abc");

Deletes all memories for a user. Use for GDPR right-to-erasure requests.

await mem.deleteUserData("user-123");

Returns the user’s structured profile, or null if profile storage is not supported or no profile exists.

const profile = await mem.getProfile("user-123");
if (profile) {
console.log(profile.allergies); // ['penicillin']
console.log(profile.vitals); // { a1c: { value: 6.8, unit: '%' } }
}

Returns: UserProfile | null


Updates the user’s structured profile with merge semantics. Creates the profile if it doesn’t exist. No-op if the storage adapter does not support profiles.

await mem.updateProfile("user-123", {
conditions: ["Type 2 diabetes"],
allergies: ["penicillin"],
});
OptionTypeDescription
userIdstringThe user whose profile to update
updatesPartial<Omit<UserProfile, "userId">>Fields to merge into the profile

In addition to the Vitamem facade, the library exports several utility functions for advanced use cases.

Apply time-based decay scoring to memory matches. Returns re-scored and re-sorted memories with adjusted relevance. Pinned memories are exempt from decay.

import { applyDecay } from "vitamem";
const scored = applyDecay(memoryMatches, {
forgettingHalfLifeMs: 15552000000, // 180 days
});
ParamTypeDescription
resultsMemoryMatch[]Memory matches to apply decay to
config{ forgettingHalfLifeMs?: number }Decay configuration

Returns: MemoryMatch[] — Re-scored and re-sorted by decayed score.


Check if a memory should be archived based on its decay score falling below minRetrievalScore. Pinned memories always return false.

import { shouldArchive } from "vitamem";
const archive = shouldArchive(memory, {
minRetrievalScore: 0.1,
forgettingHalfLifeMs: 15552000000,
});
ParamTypeDescription
memoryMemoryThe memory to evaluate
config{ minRetrievalScore?: number; forgettingHalfLifeMs?: number }Archive thresholds

Returns: booleantrue if the memory’s decay score falls below the threshold.


reflectOnExtraction(extractedFacts, existingMemories, originalMessages, llm, customPrompt?)

Section titled “reflectOnExtraction(extractedFacts, existingMemories, originalMessages, llm, customPrompt?)”

Run a second LLM call to validate extracted facts against existing memories and conversation. Checks accuracy, detects conflicts, catches missed facts, and enriches vague facts. If reflection fails (e.g., invalid JSON), returns original facts unchanged so the pipeline is never broken.

import { reflectOnExtraction } from "vitamem";
const result = await reflectOnExtraction(
extractedFacts,
existingMemories,
conversationMessages,
llmAdapter,
);
ParamTypeDescription
extractedFactsExtractedFact[]Facts from the extraction step
existingMemoriesArray<{ content: string; source: string }>User’s current stored memories
originalMessagesArray<{ role: string; content: string }>The original conversation
llm{ chat: (...) => Promise<string> }LLM adapter with a chat method
customPromptstring?Optional custom system prompt

Returns: Promise<ReflectionResult>


Convert a ReflectionResult into a flat array of ExtractedFact[] for the pipeline. Filters out facts with action "remove" and merges in missed facts.

import { applyReflectionResult } from "vitamem";
const finalFacts = applyReflectionResult(reflectionResult);
ParamTypeDescription
resultReflectionResultThe result from reflectOnExtraction()

Returns: ExtractedFact[] — Cleaned, enriched facts ready for the embedding pipeline.


Classify extracted facts into structured profile fields using pattern-matching rules.

import { classifyStructuredFacts, HEALTH_STRUCTURED_RULES } from "vitamem";
const structuredFacts = classifyStructuredFacts(extractedFacts, HEALTH_STRUCTURED_RULES);

Apply classified structured facts to a user profile with set/add/remove semantics.

import { applyStructuredFacts } from "vitamem";
const updatedProfile = applyStructuredFacts(currentProfile, structuredFacts);

ExportDescription
PRESETSBuilt-in preset configurations
canTransition(from, to)Check if a thread state transition is valid
transition(thread, to)Apply a state transition to a thread
shouldCool(thread, timeoutMs)Check if a thread should transition to cooling
shouldGoDormant(thread, timeoutMs)Check if a thread should transition to dormant
reactivate(thread)Reactivate a cooling thread back to active
extractMemories(messages, llm, sessionDate?)Extract memories from messages using an LLM
cosineSimilarity(a, b)Compute cosine similarity between two vectors
isDuplicate(a, b, threshold)Check if two embeddings are duplicates
deduplicateFacts(facts, existing)Deduplicate extracted facts against existing memories
validateExtraction(memories)Validate extracted memory structure
runEmbeddingPipeline(...)Run the full embedding pipeline (extract, embed, dedup, save)
applyRecencyWeighting(results, weight, maxAge)Apply recency-based score weighting
applyMMR(candidates, weight, limit)Apply Maximal Marginal Relevance for diverse retrieval
EphemeralAdapterIn-memory storage adapter
SupabaseAdapterSupabase-backed storage adapter
createOpenAIAdapter(opts)Factory for OpenAI LLM adapter
createAnthropicAdapter(opts)Factory for Anthropic LLM adapter
createOllamaAdapter(opts)Factory for Ollama LLM adapter
HEALTH_AUTO_PIN_RULESBuilt-in auto-pin rules for health domains
HEALTH_STRUCTURED_RULESBuilt-in structured extraction rules for health domains
createEmptyProfile(userId)Create an empty UserProfile with default values