Generative AI Memory

Memory: The memory modules include semantic, episodic, and procedural memory, which can provide additional useful information to help LLM make reasoning decisions
- Semantic Memory: Semantic memory stores acknowledged world knowledge of LLM-based agents, usually in the form of external knowledge retrieval bases which include documents, libraries, APIs,
- Episodic Memory: Episodic memory records content related to the current case and experience information from previous decision-making processes. Content related to the current case (such as relevant information found in the search database, samples provided by In-context learning (ICL) technology, etc.)
- Procedural Memory: The procedural memory of Agents in software engineering contains the implicit knowledge stored in the LLM weights and the explicit knowledge written in the agent’s code.
Letta: Utilizes an in-context memory design, showing messages and system prompts within a configurable token limit. Also features core memory blocks that remain visible in the prompt window, and a recall memory for recently accessed data.
Mem0.ai: Offers personalization by storing conversation history and user preferences in memory. Provides short-term memory in chat contexts, enhanced by local or remote vector stores.
Zep: Maintains session-based interactions, storing conversation transcripts as memory blocks. Automatically summarizes large histories for more concise context windows.
CrewAI: Divides memory into short-term for recent interactions, using contextual awareness to keep immediate conversation elements accessible.
Cognee: Recommends a combination of immediate context windows and fast retrieval methods to reduce hallucinations and ensure accuracy.

MemGPT

MemGPT: A Hierarchical Memory System for Large Language Models (LLMs)

MemGPT is a new LLM system that addresses the limited context window problem of LLMs by drawing inspiration from the hierarchical memory systems found in traditional operating systems (OSes).
MemGPT introduces virtual context management to provide the illusion of an extended context for LLMs, much like how virtual memory in OS allows applications to work with datasets exceeding available physical memory.
The key idea behind MemGPT is to enable LLMs to manage their own memory by using function calls, allowing them to read and write data to external sources, modify their own context, and decide when to respond.

Currently it implemented as letta

Components of Letta

Prompt (which has below things)
- System instruction
- Working context: A dynamic, read/write area where MemGPT can store essential information about the user, task, or conversation. Think of this as a scratchpad for retaining key facts and preferences.
- FIFO queue : A rolling history of messages, system actions, and function call inputs/outputs. It operates on a First-In, First-Out basis, meaning older entries are evicted as new ones arrive. A recursive summary of evicted messages is stored at the beginning of the queue to preserve some context from past interactions
Recall Memory : A dedicated database for storing and retrieving past messages, ensuring a comprehensive record of interactions even as the FIFO Queue evicts older entries
Archival storage: A persistent database for storing arbitrary-length text objects. This could include documents, code, or any other information deemed relevant for future retrieval,MemGPT uses PostgreSQL with the pgvector extension for archival storage, enabling efficient vector search with an HNSW index.

cognee

Reliable LLM Memory for AI Applications and AI Agents

Cognee merges graph and vector databases to uncover hidden relationships and new patterns in your data. You can automatically model, load and retrieve entities and objects representing your business domain and analyze their relationships, uncovering insights that neither vector stores nor graph stores alone can provide

https://docs.cognee.ai/use-cases/code-assistants

https://docs.mem0.ai/overview

SuperMemory

https://supermemory.ai/ (no #1 )

Observation Memory

mastra build this where they have two type of agent one is observer and reflector observer look for the current conversation and when the context exceed the length they have a prompt that will send with conversation and extract memories and stored

once the memories are exceed they use reflected to reduce that and store.

After the observer agent it will in the conversation replaced by the whole chat with memory

How it helps?

It help llm to focus on what needed after long conversation by removing unwantd thing

https://mastra.ai/blog/observational-memory

TOOLS

https://composio.dev/