Memory Tiers
Understanding the different layers of memory storage.
Overview
Raven organizes memory into conceptual tiers based on access patterns, retention requirements, and storage characteristics. This tiered approach optimizes for both performance and cost.
Context Window
Model's active context - not persisted
Core Memory
Fast-access user profile and session state
Episodic Memory
Conversation history and interactions
Semantic Memory
Extracted patterns and embeddings
Tier 1: Context Window
The context window is the LLM's active working memory. It's ephemeral and exists only during a single request/response cycle.
Characteristics
- •Not persisted by Raven (managed by your LLM)
- •Limited by model's token limit (4K-128K+ tokens)
- •Includes system prompt, conversation history, retrieved context
- •Cleared after each request completes
Tier 2: Core Memory
Fast-access storage for user profiles, session state, and frequently accessed data. Stored in Redis for sub-millisecond access.
What's Stored
- •User metadata and preferences
- •Conversation buffer (pending interactions)
- •Session state and context
- •API key lookups and validation
- •Recent blob ID references
Access Pattern
// Buffer key pattern
tenant:{tenant_id}:user:{user_id}:conversation:{conv_id}:buffer
// Example: Store pending interaction
await redis.rpush(bufferKey, JSON.stringify(interaction));
// Example: Get buffer size
const size = await redis.llen(bufferKey);Tier 3: Episodic Memory
Long-term storage of conversation history. Episodic memories are specific events - individual interactions between users and agents.
Structure
interface EpisodicMemory {
blob_type: 'episodic';
conversations: Array<{
conversation_id: string;
timestamp: string;
user_message: string;
agent_response: string;
metadata?: Record<string, unknown>;
}>;
}When It's Created
- •Buffer reaches batch_size (default: 10 interactions)
- •Explicit flush request from agent
- •Conversation ends or times out
Tier 4: Semantic Memory
Derived knowledge extracted from episodic memories. Includes facts, patterns, preferences, and vector embeddings for semantic search.
Structure
interface SemanticMemory {
blob_type: 'semantic';
extracted_facts: string[]; // "User prefers TypeScript"
patterns: Array<{
pattern: string;
frequency: number;
confidence: number;
}>;
}
interface EmbeddingMemory {
blob_type: 'embeddings';
embeddings: Array<{
text: string;
vector: number[];
metadata?: Record<string, unknown>;
}>;
}How It's Generated
- •Background analysis worker processes episodic memory
- •LLM extracts facts and patterns from conversations
- •Embedding models generate vectors for semantic search
- •Patterns are updated based on frequency and recency
Storage Comparison
| Tier | Storage | Latency | Retention | Encrypted |
|---|---|---|---|---|
| Context Window | In-memory | <1ms | Request only | N/A |
| Core Memory | Redis | <5ms | Session/TTL | At rest |
| Episodic | Walrus | 50-200ms | Configurable | AES-256 |
| Semantic | Walrus | 50-200ms | Configurable | AES-256 |
Memory Query Flow
When an agent queries for context, Raven searches across tiers:
Query: "What programming language does the user prefer?"
↓
1. Check Tier 2 (Core Memory)
└─ Look for user preferences in Redis
↓
2. Search Tier 4 (Semantic Memory)
└─ Query extracted facts: "User prefers TypeScript"
└─ Semantic search on embeddings for similar concepts
↓
3. Search Tier 3 (Episodic Memory)
└─ Find relevant conversation snippets
└─ Rank by relevance and recency
↓
4. Assemble Context
└─ Combine facts + relevant episodes
└─ Return for injection into Tier 1 (Context Window)