Skip to content

Memory Extraction

When a thread transitions to dormant, Vitamem extracts the meaningful facts from the conversation using your LLM adapter. This is the core operation that turns raw conversation history into a searchable memory store.

Extraction converts raw conversation into structured, searchable facts. Instead of storing entire message histories verbatim, Vitamem distills each session down to the handful of statements that matter — stripping away greetings, filler, and back-and-forth clarification. Each fact is classified as confirmed or inferred, giving downstream systems a confidence signal they wouldn’t have with raw messages.

The real power of extraction compounds over time. As sessions accumulate, deduplication and supersede keep the memory store lean instead of growing linearly:

SessionRaw conversationMemory store result
1”I’m learning Rust and want to ship my first CLI by March”New facts saved: "Learning Rust", "Goal: ship CLI by March"
5User mentions Rust again in passingDeduplicated — already stored, nothing added
12”Finished the CLI, now building a web API in Rust”Supersedes the old goal in-place

After 30 daily sessions, a user’s memory store might contain ~20 refined facts rather than thousands of raw messages. Without extraction, memory grows with every conversation; with it, the store converges on a concise, current knowledge base that is fast to search, cheap to embed, and keeps LLM context windows lean — only extracted facts are searched and retrieved, not raw conversation history.

The extraction pipeline calls llm.extractMemories(messages) with the full conversation history. Your LLM adapter is responsible for returning an array of structured facts:

// What the LLM adapter returns
[
{ content: "Prefers TypeScript over JavaScript", source: "confirmed" },
{ content: "Deploys on Vercel", source: "confirmed" },
{ content: "Working on a React Native mobile app", source: "confirmed" },
{ content: "Prefers concise explanations", source: "inferred" },
]

Facts are classified by how confident we are they represent ground truth:

SourceMeaningExample
confirmedUser explicitly stated this”I prefer TypeScript”
inferredDerived from context or assistant messagesUser seems to prefer concise explanations

This classification helps downstream systems apply appropriate confidence when using memories to personalize responses.

When using the built-in adapter factories (createOpenAIAdapter, createAnthropicAdapter, createOllamaAdapter), Vitamem includes a domain-agnostic default extraction prompt that works across all use cases.

The default prompt extracts personal details, preferences, goals, habits, and important context — anything the user would expect to be remembered next time. Facts are tagged with categories like "preference", "goal", "personal", "professional", "lifestyle", or "general".

For health-specific use cases, Vitamem exports HEALTH_EXTRACTION_PROMPT and HEALTH_EXTRACTION_PROMPT_ANTHROPIC — these include structured profile field support for conditions, medications, allergies, vitals, and goals. See the Health Companion Guide for details.

The {conversation} placeholder is replaced with the formatted conversation history at extraction time.

You can use a cheaper or faster model for memory extraction while keeping a more capable model for chat responses. Set extractionModel in your config:

const vm = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-4o", // capable model for chat
extractionModel: "gpt-4o-mini", // cheaper model for extraction
storage: "ephemeral",
});

This is useful when extraction doesn’t require the full reasoning power of your chat model. Since extraction happens in the background during dormant transitions, using a cheaper model can significantly reduce costs without impacting chat quality.

You can set extractionPrompt directly in the top-level VitamemConfig. When using the provider shortcut, this is forwarded to the built-in adapter automatically:

const mem = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
storage: "ephemeral",
extractionPrompt: `You are a fitness coach assistant. Extract facts about this person's
fitness journey from the conversation.
Focus on:
- Exercise routines and frequency
- Personal records and progress
- Injuries or physical limitations
- Nutritional preferences and diet plans
Conversation:
{conversation}
Return a JSON array only:
[{ "content": "specific factual statement", "source": "confirmed" | "inferred" }]`,
});

Alternatively, you can pass extractionPrompt to the adapter factory directly:

import { createVitamem, createOpenAIAdapter } from "vitamem";
const llm = createOpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
chatModel: "gpt-4o",
extractionPrompt: `You are a fitness coach assistant. Extract facts about this person's
fitness journey from the conversation.
Focus on:
- Exercise routines and frequency
- Personal records and progress
- Injuries or physical limitations
- Nutritional preferences and diet plans
- Competition goals and timelines
Conversation:
{conversation}
Return a JSON array only:
[{ "content": "specific factual statement", "source": "confirmed" | "inferred" }]
- "confirmed" = user directly stated this
- "inferred" = derived from context
- Skip greetings and small talk
- Include specific numbers (weights, reps, times) when mentioned`,
});
const mem = await createVitamem({
llm,
storage: "ephemeral",
});

Your custom extraction prompt must:

  1. Include {conversation} where the formatted messages should be inserted
  2. Instruct the LLM to return a JSON array of { content, source } objects
  3. Define what "confirmed" and "inferred" mean for your domain
  • Be domain-specific. A fitness app should extract different facts than a tutoring assistant.
  • List categories. Explicitly tell the LLM what kinds of facts to look for.
  • Include negative examples. Tell the LLM what to skip (greetings, questions, opinions).
  • Request specificity. Ask for numbers, dates, and dosages instead of vague statements.

If you need complete control over extraction logic (not just the prompt), implement the LLMAdapter interface directly:

const customLLM: LLMAdapter = {
async chat(messages) {
// Your chat implementation
},
async extractMemories(messages) {
// Your custom extraction logic
// Can use multiple LLM calls, rule-based filtering, etc.
},
async embed(text) {
// Your embedding implementation
},
};
const mem = await createVitamem({
llm: customLLM,
storage: "ephemeral",
});

Vitamem also exports a simple rule-based extractor (extractFactsSimple) that does not require an LLM. It classifies user messages as confirmed and assistant messages as inferred, looking for sentences containing personal pronouns and action verbs.

import { extractFactsSimple } from "vitamem";
const facts = extractFactsSimple(messages);
// [{ content: "I prefer TypeScript", source: "confirmed" }, ...]

This is useful for testing but produces lower-quality memories than LLM extraction.

Vitamem uses structured output when available to ensure reliable JSON parsing from LLM responses during memory extraction.

When extracting memories, Vitamem requests JSON-formatted responses from the LLM provider:

  • OpenAI / DashScope / Ollama: Uses response_format: { type: "json_object" } to guarantee valid JSON output
  • Anthropic: Uses prompt-based JSON guidance with post-validation

All responses are parsed through a validation layer that:

  1. Accepts both wrapper objects ({ "memories": [...] }) and bare arrays for backward compatibility
  2. Validates each entry has a content string and a valid source (“confirmed” or “inferred”)
  3. Strips invalid entries silently — partial extraction is better than total failure
ProviderJSON ModeSchema Enforcement
OpenAIjson_objectjson_schema available for gpt-4o+
DashScopejson_objectNot supported — prompt-based
AnthropicPrompt-basedoutput_config.format available (GA)
Ollamajson_objectNative format supports schemas

Vitamem uses a two-table architecture to handle different types of user data optimally:

Critical structured data (like health metrics, medications, or allergies in a health app) can be extracted into a structured profile with direct field updates. This means:

  • Value updates are immediate — no embedding comparison needed
  • Entries are deduplicated by field — adding the same item twice results in one entry
  • Profile data is always available — injected into every conversation automatically

Contextual information like lifestyle habits, preferences, and conversation insights continue through the embedding pipeline with vector-based deduplication and supersede.

During extraction, each fact is classified by pattern matching:

  1. Facts matching structured data patterns → Profile update (no embedding needed)
  2. All other facts → Embedding pipeline (existing dedup + supersede logic)

Configure with structuredExtractionRules in your Vitamem config:

import { createVitamem, HEALTH_STRUCTURED_RULES } from "vitamem";
const mem = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
storage: "ephemeral",
structuredExtractionRules: HEALTH_STRUCTURED_RULES,
});

The HEALTH_STRUCTURED_RULES preset includes patterns for:

  • Vitals: A1C, blood pressure, weight, blood glucose
  • Allergies: any allergy mention
  • Medications: medication with dosage
  • Conditions: diagnosed conditions
  • Goals: health and wellness goals

You can also define custom rules using the StructuredExtractionRule interface.

Critical information can be automatically pinned during extraction so it always appears first in retrieval results. This ensures important facts (allergies, constraints, key preferences) are never buried by less important memories.

Pass autoPinRules in your VitamemConfig. Each rule is an AutoPinRule — either a regex pattern or a custom test function:

import { createVitamem, HEALTH_AUTO_PIN_RULES } from "vitamem";
import type { AutoPinRule } from "vitamem";
// Use built-in health rules
const mem = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
storage: "ephemeral",
autoPinRules: HEALTH_AUTO_PIN_RULES,
});

HEALTH_AUTO_PIN_RULES includes 7 rules that cover the most common safety-critical categories:

PatternReasonExample Match
\ballerg(y|ic|ies)\ballergy”Allergic to penicillin”
\banaphyla(xis|ctic)\banaphylaxis-risk”History of anaphylaxis”
\b(drug|medication)\s*(interaction|contraindication)\bdrug-interaction”Known drug interaction with warfarin”
\bdo\s*not\s*(take|use|prescribe)\bcontraindication”Do not prescribe NSAIDs”
\bemergency\s*contact\bemergency-contact”Emergency contact: spouse at 555-1234”
\bblood\s*type\bblood-type”Blood type A+“
\b\d+\s*(mg|mcg|ml|units?)\bmedication-dosage”Takes Metformin 1000mg”

You can define your own rules using regex patterns or test functions:

const customRules: AutoPinRule[] = [
{ pattern: /\bDNR\b/i, reason: "do-not-resuscitate" },
{
test: (memory) => memory.source === "confirmed" && /\bimplant\b/i.test(memory.content),
reason: "medical-implant",
},
];
  • Auto-pinned memories always appear first in retrieval results
  • Superseded memories can also be auto-pinned — pinning is additive and never removes existing pins
  • Pin status persists across memory updates

The extraction pipeline treats user messages and AI/assistant messages differently to avoid redundant writes:

  • User messages: All stated facts are extracted as new data. If the user says “I switched to Vercel for hosting”, that is always extracted.
  • AI/assistant messages: Only genuinely new information is extracted — recommendations, advice, or new facts the assistant introduces.

Critically, when the AI echoes back known data (e.g., repeating a user’s A1C value from memory context), that echo is not re-extracted. This prevents redundant writes and preserves data history.

MessageRoleExtracted?Reason
”I switched to Vercel for hosting”UserYesNew fact stated by user
”Since you’re on Vercel, here are some deployment tips…”AssistantNo (Vercel fact)AI is echoing back a known value
”I recommend using Edge Functions for lower latency”AssistantYesGenuinely new recommendation

Without this distinction, every time the AI acknowledged a fact from memory, the pipeline would re-extract and re-process it — triggering unnecessary deduplication checks.

Facts tagged as "inferred" are held to a high bar: they require specific, unambiguous evidence from the conversation. The extraction pipeline does not extract generalizations, assumptions, or speculations.

  • The user says something that directly implies a specific fact (e.g., “I take my pills with breakfast” → inferred: takes medication in the morning)
  • The assistant provides a concrete recommendation that represents new information
  • Broad generalizations unsupported by specific statements
  • Speculative conclusions about the user’s intent or motivation
  • Re-statements of existing knowledge in different words

If a user says: “I exercise Monday, Wednesday, and Friday, and I’m cutting carbs.”

ExtractedSourceValid?
”Exercises on Monday, Wednesday, and Friday”confirmedYes — directly stated
”Reducing carbohydrate intake”confirmedYes — directly stated
”Implementing changes to exercise routine”inferredNo — unsupported generalization
”Motivated to improve health”inferredNo — speculation

When in doubt, the system errs on the side of not extracting. A missing inference can be captured in a future conversation, but a false inference pollutes the memory store and is harder to correct.

Extracted facts are automatically enriched with a temporal suffix showing when they were learned. For example, a fact extracted from a session on March 15, 2026 would be stored as:

"Prefers TypeScript over JavaScript (mentioned 2026-03-15)"

The pipeline derives the date from thread.lastMessageAt, which reflects the timestamp of the most recent message in the session. This temporal annotation enables:

  • Temporal reasoning — the LLM can understand when facts were learned and reason about time-sensitive information (e.g., “you mentioned this 3 months ago”)
  • Chronological retrieval — memories can be sorted by date, giving users and AI a timeline of how information evolved

Temporal encoding is applied automatically during the extraction pipeline. No configuration is needed.

See Temporal Encoding for a deeper look at how dates are derived and used in retrieval.

Vitamem supports an optional reflection pass — a second LLM call that validates and enriches the facts produced by the initial extraction. When enabled, the reflection pass:

  • Catches contradictions — flags facts that conflict with each other or with existing memories
  • Enriches vague facts — adds specificity to facts that are too general (e.g., “exercises sometimes” becomes “exercises 3 times per week”)
  • Detects missed information — identifies important facts that the initial extraction overlooked

Enable reflection in your config:

const mem = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
storage: "ephemeral",
enableReflection: true,
});

Reflection adds one additional LLM call per dormant transition. It is especially valuable for any domain where accuracy matters (health, finance, legal, education) and less critical for casual applications.

See Reflection for details on how the reflection prompt works and how to customize it.

After extraction, each fact goes through the deduplication pipeline before being saved. Only facts that are genuinely new (not already represented in the user’s memory store) are persisted.

The final stored memory includes:

  • content — the fact text
  • sourceconfirmed or inferred
  • embedding — the vector representation (e.g., 1536 dimensions for text-embedding-3-small)
  • userId — for retrieval scoping
  • threadId — for traceability
  • pinnedtrue if the memory matched an auto-pin rule (see above)
  • tags — string array for categorization (e.g. the auto-pin rule name)
  • lastRetrievedAt — timestamp of the last time this memory was returned by a retrieval query (used by active forgetting)
  • retrievalCount — how many times this memory has been retrieved (helps distinguish frequently-accessed facts from stale ones)