Architecture

Thin cloud layer. Real memory engine.

Memorylayer keeps hosted concerns separate from Engram: identity, workspaces, API keys, audit trails, connection kits, and a small HTTP bridge. The memory runtime is Postgres-backed Engram with explicit embedding and reranker models.

Connect an agent Operate it Architecture JSON

client service schema engine

01 / Client

Agents and scripts

Codex, Claude, custom runners, CI jobs, and scripts call a workspace-scoped surface.

02 / Service

Auth and control

GitHub login, workspace membership, hashed keys, audit trails, and usage events.

03 / Boundary

Separate schemas

Every workspace maps to its own Engram schema and operational history.

04 / Engram

Memory runtime

Retrieval, graph, handoff, curation, and continuity stay inside the real engine.

RuntimeFastAPI 0.136.3 on Uvicorn.

Memorypostgres schemas using BAAI/bge-small-en-v1.5.

Surface17 examples, 60 tools, OpenAPI, and manifests.

Architecture specs.

The live service is a narrow hosted control plane around the Engram runtime. These are the concrete parts currently wired in this repo.

OpenAPI

Service runtime

FastAPI app served by Uvicorn on port 8090 inside Docker, exposed through https://memorylayer.run.

Python 3.12.13FastAPIUvicorn

Database

Postgres stores service metadata, users, workspaces, keys, usage events, audit events, ingest runs, and each workspace's Engram schema.

PostgresSQLAlchemy 2.0.50psycopg 3.3.4

Memory package

engram-memory-system 0.5.2 provides the store, retrieval, graph, curation, session handoff, and MCP-compatible tool layer.

EngramMCP bridgeworkspace cache

Models used.

These are the memory models exposed by Engram's current config defaults, with the hosted service overriding storage to Postgres.

embedding

BAAI/bge-small-en-v1.5 creates dense vectors for semantic memory search.

dimension

384 dimensions per embedding vector.

reranker

cross-encoder/ms-marco-MiniLM-L-6-v2 is the cross-encoder reranker model when reranking is available.

backend

embedding_backend = auto; the hosted runtime uses CPU PyTorch container.

storage

Engram's local default is sqlite, but Memorylayer sets storage_backend = postgres and injects a workspace schema DSN.

Boundaries that matter.

The hosted service does not pretend to be the memory system. It gives the memory system accounts, keys, URLs, and operational controls.

Service manifest Readiness JSON

Account plane

GitHub OAuth, sessions, members, invites, and dashboard routes.

browserhuman

Workspace plane

API keys, audit events, usage events, ingestion runs, exports, and connection kits.

operatoragent

Memory plane

Engram recall, remember, graph, curation, summaries, handoffs, and health checks.

runtimeengine

Request path.

A workspace API call has a small number of predictable checkpoints.

01 guard

Host, method, path, size, origin, rate limit, and security headers are handled before routing.

02 auth

Workspace routes resolve a bearer or X-API-Key token against a hashed key for that slug.

03 audit

Successful and selected failed calls are recorded with route, key, status, and metadata.

04 engine

The call enters the workspace Engram runtime and returns memory results through a stable JSON shape.

Deployment shape.

Simple by design: one Python app, one Postgres database, one long-lived memory process model.

FastAPI

Public pages, workspace dashboard, OAuth, and JSON endpoints.

Postgres

App metadata plus per-workspace Engram schemas with postgres search_path is scoped per workspace schema.

Docker

Repeatable VPS deployment using python:3.12-slim and a narrow exposed port.

Surface

2 starter skills, 5 playbooks, and 6 SDK snippets.

Runtime limits.

The app keeps the operational shape explicit so the memory process stays predictable.

cache

Up to 16 workspace runtimes are kept warm in-process.

ttl

Idle workspace runtimes are pruned after 1800 seconds.

payload

Default request body limit is 2000000 bytes before routing.

api limit

Default API throttle is 240 requests per minute per client bucket.

auth limit

Default auth throttle is 12 requests per minute per client bucket.