Memorylayer

Hosted memory runtime for agents

Architecture

Thin cloud layer. Real memory engine.

Memorylayer keeps hosted concerns separate from Engram: identity, workspaces, API keys, audit trails, connection kits, and a small HTTP bridge. The memory runtime is Postgres-backed Engram with explicit embedding and reranker models.

client service schema engine
01 / Client
Agents and scripts

Codex, Claude, custom runners, CI jobs, and scripts call a workspace-scoped surface.

02 / Service
Auth and control

GitHub login, workspace membership, hashed keys, audit trails, and usage events.

03 / Boundary
Separate schemas

Every workspace maps to its own Engram schema and operational history.

04 / Engram
Memory runtime

Retrieval, graph, handoff, curation, and continuity stay inside the real engine.

RuntimeFastAPI 0.136.3 on Uvicorn.
Memorypostgres schemas using BAAI/bge-small-en-v1.5.
Surface17 examples, 60 tools, OpenAPI, and manifests.

Architecture specs.

The live service is a narrow hosted control plane around the Engram runtime. These are the concrete parts currently wired in this repo.

OpenAPI

Service runtime

FastAPI app served by Uvicorn on port 8090 inside Docker, exposed through https://memorylayer.run.

Python 3.12.13FastAPIUvicorn

Database

Postgres stores service metadata, users, workspaces, keys, usage events, audit events, ingest runs, and each workspace's Engram schema.

PostgresSQLAlchemy 2.0.50psycopg 3.3.4

Memory package

engram-memory-system 0.5.2 provides the store, retrieval, graph, curation, session handoff, and MCP-compatible tool layer.

EngramMCP bridgeworkspace cache

Models used.

These are the memory models exposed by Engram's current config defaults, with the hosted service overriding storage to Postgres.

embedding
BAAI/bge-small-en-v1.5 creates dense vectors for semantic memory search.
dimension
384 dimensions per embedding vector.
reranker
cross-encoder/ms-marco-MiniLM-L-6-v2 is the cross-encoder reranker model when reranking is available.
backend
embedding_backend = auto; the hosted runtime uses CPU PyTorch container.
storage
Engram's local default is sqlite, but Memorylayer sets storage_backend = postgres and injects a workspace schema DSN.

Boundaries that matter.

The hosted service does not pretend to be the memory system. It gives the memory system accounts, keys, URLs, and operational controls.

Account plane

GitHub OAuth, sessions, members, invites, and dashboard routes.

browserhuman

Workspace plane

API keys, audit events, usage events, ingestion runs, exports, and connection kits.

operatoragent

Memory plane

Engram recall, remember, graph, curation, summaries, handoffs, and health checks.

runtimeengine

Request path.

A workspace API call has a small number of predictable checkpoints.

01 guard
Host, method, path, size, origin, rate limit, and security headers are handled before routing.
02 auth
Workspace routes resolve a bearer or X-API-Key token against a hashed key for that slug.
03 audit
Successful and selected failed calls are recorded with route, key, status, and metadata.
04 engine
The call enters the workspace Engram runtime and returns memory results through a stable JSON shape.

Deployment shape.

Simple by design: one Python app, one Postgres database, one long-lived memory process model.

FastAPI

Public pages, workspace dashboard, OAuth, and JSON endpoints.

Postgres

App metadata plus per-workspace Engram schemas with postgres search_path is scoped per workspace schema.

Docker

Repeatable VPS deployment using python:3.12-slim and a narrow exposed port.

Surface

2 starter skills, 5 playbooks, and 6 SDK snippets.

Runtime limits.

The app keeps the operational shape explicit so the memory process stays predictable.

cache
Up to 16 workspace runtimes are kept warm in-process.
ttl
Idle workspace runtimes are pruned after 1800 seconds.
payload
Default request body limit is 2000000 bytes before routing.
api limit
Default API throttle is 240 requests per minute per client bucket.
auth limit
Default auth throttle is 12 requests per minute per client bucket.