Introduction
What is Vitamem?
Section titled “What is Vitamem?”Vitamem is a TypeScript library that gives AI applications lifecycle-aware long-term memory. Conversations have a natural lifecycle — active sessions, quiet pauses, natural endings. Vitamem tracks that lifecycle and uses it to decide when to extract and store facts, so your AI remembers what matters without the cost and noise of per-message embedding.
Every conversation thread moves through four states:
- Active — the session is live, messages are stored in full
- Cooling — the session paused; a new message reactivates it
- Dormant — facts are extracted, embedded once, and deduplicated
- Closed — thread archived, memories live on and remain searchable
This lifecycle model is the core insight. Instead of embedding every message in real time, Vitamem waits for the natural rest point, then extracts what matters. The result: fewer API calls, cleaner memories, and better retrieval.
3 Lines to Long-Term Memory
Section titled “3 Lines to Long-Term Memory”import { createVitamem } from "vitamem";
const mem = await createVitamem({ provider: "openai", apiKey: process.env.OPENAI_API_KEY!, storage: "ephemeral",});No manual adapter wiring. No boilerplate. Pass a provider name, an API key, and a storage backend — Vitamem handles the rest. Built-in adapters are available for OpenAI, Anthropic, and Ollama.
Why Vitamem
Section titled “Why Vitamem”Most AI memory systems embed every message as it arrives. This creates three problems:
- Cost. Embedding calls on every message add up fast, especially in long conversations.
- Noise. Raw messages contain greetings, follow-up questions, and filler that dilute the memory store. Searching through noisy embeddings returns poor results.
- Duplication. A user who mentions “I use TypeScript” in five sessions gets five near-identical memories, wasting storage and confusing retrieval.
Vitamem takes a different approach: lifecycle-aware memory extraction.
Instead of embedding messages in real time, Vitamem waits until a conversation session goes dormant. At that point, it sends the full conversation to the LLM with an extraction prompt, gets back structured facts (like “Prefers TypeScript” and “Deploys on Vercel”), embeds those facts, and deduplicates them against existing memories using cosine similarity.
The result:
- ~8x fewer embedding calls — computed once per session on extracted facts, not on every message — plus significantly fewer LLM input tokens per chat through selective retrieval
- Cleaner memories — structured facts instead of raw conversational noise
- Automatic deduplication — genuinely new facts are preserved, redundant ones are discarded
- Better retrieval — searching over concise facts produces more relevant results than searching over chat messages
The Core Insight
Section titled “The Core Insight”Context window stuffing is expensive. Most systems dump entire memory stores into every LLM call, wasting tokens on irrelevant information. Vitamem retrieves only what’s relevant.
The real cost math:
- Embedding costs ~$0.02/1M tokens — cheap
- LLM input tokens cost $2.50–15/1M — 125–750x more expensive
- The real savings come from sending ~300 tokens of relevant context instead of 2,000+ tokens of everything
Vitamem embeds once per thread at dormant transition, resulting in ~8x fewer embedding calls — plus significantly fewer LLM input tokens per chat through selective retrieval, producing better, more meaningful memories.
What Vitamem Stores
Section titled “What Vitamem Stores”When a thread goes dormant, Vitamem extracts and stores facts about the user:
"Prefers TypeScript over JavaScript" (source: confirmed)"Deploys on Vercel" (source: confirmed)"Working on a React Native app" (source: confirmed)"Prefers concise explanations" (source: inferred)These facts are embedded as vectors and deduplicated against existing memories using cosine similarity. Later, when the user returns, the AI can retrieve relevant facts and respond like it actually remembers them.
Key Features
Section titled “Key Features”- Lifecycle-aware threads — Active -> Cooling -> Dormant -> Closed. Automatically reactivates when users return.
- Lower cost — Selective retrieval means fewer tokens per LLM call (send ~300 relevant tokens instead of 2,000+), plus ~8x fewer embedding calls through lifecycle batching.
- Semantic deduplication — Cosine similarity prevents storing the same fact twice. Genuinely new facts are preserved.
- Temporal encoding & active forgetting — Facts are timestamped when learned, and older unretrieved memories naturally decay — just like human memory. See Temporal Encoding and Active Forgetting.
- Unified config API — One object. Provider, API key, storage. Built-in adapters for OpenAI, Anthropic, and Ollama.
- autoRetrieve — Automatically inject relevant memories into every chat call. No manual retrieval needed.
- Zero production dependencies — No hidden supply chain risk. Provider SDKs are optional peer dependencies.
Use Cases
Section titled “Use Cases”- Health companions — remembers symptoms, medications, conditions, and health goals across sessions
- Coaching assistants — tracks goals, progress, and setbacks over time
- Tutoring systems — knows what students understand, where they struggle, and what they’ve mastered
- Support agents — recalls customer context, issue history, and preferences
Next Steps
Section titled “Next Steps”- Installation — Provider setup, peer dependencies, and TypeScript config
- Quickstart — Get up and running in under 5 minutes
- Tutorial — Build your first Vitamem project in 10 minutes
- Thread Lifecycle — Understand the four states and transitions