Skip to content

Introduction

Vitamem is a TypeScript library that gives AI applications lifecycle-aware long-term memory. Conversations have a natural lifecycle — active sessions, quiet pauses, natural endings. Vitamem tracks that lifecycle and uses it to decide when to extract and store facts, so your AI remembers what matters without the cost and noise of per-message embedding.

Every conversation thread moves through four states:

  • Active — the session is live, messages are stored in full
  • Cooling — the session paused; a new message reactivates it
  • Dormant — facts are extracted, embedded once, and deduplicated
  • Closed — thread archived, memories live on and remain searchable

This lifecycle model is the core insight. Instead of embedding every message in real time, Vitamem waits for the natural rest point, then extracts what matters. The result: fewer API calls, cleaner memories, and better retrieval.

import { createVitamem } from "vitamem";
const mem = await createVitamem({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
storage: "ephemeral",
});

No manual adapter wiring. No boilerplate. Pass a provider name, an API key, and a storage backend — Vitamem handles the rest. Built-in adapters are available for OpenAI, Anthropic, and Ollama.

Most AI memory systems embed every message as it arrives. This creates three problems:

  1. Cost. Embedding calls on every message add up fast, especially in long conversations.
  2. Noise. Raw messages contain greetings, follow-up questions, and filler that dilute the memory store. Searching through noisy embeddings returns poor results.
  3. Duplication. A user who mentions “I use TypeScript” in five sessions gets five near-identical memories, wasting storage and confusing retrieval.

Vitamem takes a different approach: lifecycle-aware memory extraction.

Instead of embedding messages in real time, Vitamem waits until a conversation session goes dormant. At that point, it sends the full conversation to the LLM with an extraction prompt, gets back structured facts (like “Prefers TypeScript” and “Deploys on Vercel”), embeds those facts, and deduplicates them against existing memories using cosine similarity.

The result:

  • ~8x fewer embedding calls — computed once per session on extracted facts, not on every message — plus significantly fewer LLM input tokens per chat through selective retrieval
  • Cleaner memories — structured facts instead of raw conversational noise
  • Automatic deduplication — genuinely new facts are preserved, redundant ones are discarded
  • Better retrieval — searching over concise facts produces more relevant results than searching over chat messages

Context window stuffing is expensive. Most systems dump entire memory stores into every LLM call, wasting tokens on irrelevant information. Vitamem retrieves only what’s relevant.

The real cost math:

  • Embedding costs ~$0.02/1M tokens — cheap
  • LLM input tokens cost $2.50–15/1M — 125–750x more expensive
  • The real savings come from sending ~300 tokens of relevant context instead of 2,000+ tokens of everything

Vitamem embeds once per thread at dormant transition, resulting in ~8x fewer embedding calls — plus significantly fewer LLM input tokens per chat through selective retrieval, producing better, more meaningful memories.

When a thread goes dormant, Vitamem extracts and stores facts about the user:

"Prefers TypeScript over JavaScript" (source: confirmed)
"Deploys on Vercel" (source: confirmed)
"Working on a React Native app" (source: confirmed)
"Prefers concise explanations" (source: inferred)

These facts are embedded as vectors and deduplicated against existing memories using cosine similarity. Later, when the user returns, the AI can retrieve relevant facts and respond like it actually remembers them.

  • Lifecycle-aware threads — Active -> Cooling -> Dormant -> Closed. Automatically reactivates when users return.
  • Lower cost — Selective retrieval means fewer tokens per LLM call (send ~300 relevant tokens instead of 2,000+), plus ~8x fewer embedding calls through lifecycle batching.
  • Semantic deduplication — Cosine similarity prevents storing the same fact twice. Genuinely new facts are preserved.
  • Temporal encoding & active forgetting — Facts are timestamped when learned, and older unretrieved memories naturally decay — just like human memory. See Temporal Encoding and Active Forgetting.
  • Unified config API — One object. Provider, API key, storage. Built-in adapters for OpenAI, Anthropic, and Ollama.
  • autoRetrieve — Automatically inject relevant memories into every chat call. No manual retrieval needed.
  • Zero production dependencies — No hidden supply chain risk. Provider SDKs are optional peer dependencies.
  • Health companions — remembers symptoms, medications, conditions, and health goals across sessions
  • Coaching assistants — tracks goals, progress, and setbacks over time
  • Tutoring systems — knows what students understand, where they struggle, and what they’ve mastered
  • Support agents — recalls customer context, issue history, and preferences
  • Installation — Provider setup, peer dependencies, and TypeScript config
  • Quickstart — Get up and running in under 5 minutes
  • Tutorial — Build your first Vitamem project in 10 minutes
  • Thread Lifecycle — Understand the four states and transitions