Data

Cephalon.Data is the runtime-neutral data layer for CephalonEngine apps. It defines read/write store contracts, command/query separation, and integration points; concrete adapters bring specific backends.

This page is the decision guide + recipe book for choosing and using a data adapter. Each backend has a “when to choose”, “how to enable”, an end-to-end example, and known limits.

Packages overview

Package	Maturity	What it brings
`Cephalon.Data`	`M3`	Runtime-neutral data abstractions. Read/write store contracts, command/query split, outbox interface.
`Cephalon.Data.EntityFramework`	`M3`	EF Core integration — DbContext baseline, inbox/outbox storage, `Sfid.EntityFramework` value converter.
`Cephalon.Data.Postgres`	`M2`	Postgres-specific helpers via Npgsql (JSONB columns, hstore, listen/notify).
`Cephalon.Data.SqlServer`	`M2`	SQL Server / Azure SQL helpers (rowversion, temporal tables).
`Cephalon.Data.MySql`	`M2`	MySQL / MariaDB adapter (uses `MySqlConnector`).
`Cephalon.Data.Oracle`	`M2`	Oracle Database adapter (uses `Oracle.ManagedDataAccess`).
`Cephalon.Data.MongoDb`	`M2`	MongoDB adapter for document workloads.
`Cephalon.Data.Cassandra`	`M2`	Cassandra adapter (4.x+).
`Cephalon.Data.ClickHouse`	`M2`	ClickHouse analytics-database adapter.
`Cephalon.Data.Elasticsearch`	`M2`	Elasticsearch 8.x adapter.
`Cephalon.Data.OpenSearch`	`M2`	OpenSearch 2.x adapter.
`Cephalon.Data.Redis`	`M2`	Redis adapter (cache, sets, sorted sets, streams).
`Cephalon.Data.Neo4j`	`M2`	Neo4j graph adapter.
`Cephalon.Data.Qdrant`	`M2`	Qdrant vector-database adapter (used by `Cephalon.Retrieval`).
`Cephalon.Data.Nats`	`M2`	NATS JetStream stream / KV adapter.
`Cephalon.Data.Debezium`	`M2`	Debezium CDC source adapter.

Maturity reference: Reference → Architecture → Maturity audit.

How to enable

The default flow uses EF Core + Postgres. Set in two places: Program.cs and appsettings.json.

1
builder.Services
2
    .AddCephalonAspNetCore()
3
    .AddData(options =>
4
    {
5
        options.UseEntityFramework();         // engine-wide EF default
6
        options.UsePostgres(builder.Configuration.GetConnectionString("Default")!);
7
        options.IdStrategy = IdStrategy.Sfid;  // Sfid (default), Guid, or Long
8
    })
9
    .AddModulesFromAssemblies(/* ... */);

1
{
2
  "Engine": {
3
    "Data": {
4
      "IdStrategy": "Sfid",
5
      "Provider": "Postgres",
6
      "ReadModel": { "Provider": "Postgres" },
7
      "WriteModel": { "Provider": "Postgres" }
8
    }
9
  },
10
  "ConnectionStrings": {
11
    "Default": "Host=localhost;Port=5432;Database=acmestore;Username=postgres;Password=postgres"
12
  }
13
}

Choosing a backend

Workload	Recommended	Why
OLTP, relational with strong consistency	Postgres / SQL Server	ACID, mature EF Core support, broad tooling
Multi-tenant SaaS with row-level isolation	Postgres + RLS	Postgres row-level security beats application-only enforcement
Document-shaped data (variable schemas)	MongoDB	Native JSON, flexible indexes, change streams
Analytics / reporting / OLAP	ClickHouse	Columnar storage, fast aggregations
Full-text search	Elasticsearch / OpenSearch	Inverted indexes, scoring, faceted search
Cache layer	Redis	Sub-ms latency, pub/sub, atomic ops
Time-series (metrics, IoT)	ClickHouse + TimescaleDB	Both work; ClickHouse for write-heavy, Postgres+Timescale for SQL access
Graph relationships	Neo4j	Native graph queries (Cypher)
Vector search / embeddings	Qdrant	Purpose-built for similarity search
Distributed write-heavy	Cassandra	Tunable consistency, multi-region
Event sourcing	Postgres + EF Core outbox + Kafka/NATS	Aggregate-root pattern + reliable delivery
CDC source for downstream	Debezium + Postgres / SQL Server	Battle-tested change capture

For multi-backend apps (write-side OLTP + read-side OLAP), use the read/write split pattern below.

Common patterns

Pattern 1: simple EF Core + Postgres CRUD

1
using Cephalon.Data.EntityFramework;
2
using Microsoft.EntityFrameworkCore;
3

4
public sealed class ProductsDbContext(DbContextOptions<ProductsDbContext> options)
5
    : CephalonDbContext(options)
6
{
7
    public DbSet<Product> Products => Set<Product>();
8

9
    protected override void OnModelCreating(ModelBuilder b)
10
    {
11
        base.OnModelCreating(b);
12
        b.Entity<Product>(e =>
13
        {
14
            e.ToTable("products");
15
            e.HasKey(p => p.Id);
16
            e.Property(p => p.Name).HasMaxLength(200).IsRequired();
17
            e.Property(p => p.Sku).HasMaxLength(64).IsRequired();
18
            e.HasIndex(p => p.Sku).IsUnique();
19
            e.Property(p => p.Price).HasColumnType("numeric(12,2)");
20
        });
21
    }
22
}

1
public sealed class ProductsModule : RestBehaviorModuleBase
2
{
3
    public override ModuleDescriptor Describe() => new(
4
        "Acme.Store.Modules.Products", "1.0.0", [Capability.Data]);
5

6
    public override void RegisterServices(IServiceCollection services)
7
    {
8
        services.AddCephalonEntityFramework<ProductsDbContext>((sp, opts) =>
9
        {
10
            var conn = sp.GetRequiredService<IConfiguration>().GetConnectionString("Products");
11
            opts.UseNpgsql(conn);
12
        });
13
        services.AddScoped<IProductCatalog, EfProductCatalog>();
14
    }
15
    /* ... */
16
}

Full walkthrough: Tutorial → First-app step 3: Wire EF Core.

Pattern 2: read/write split (OLTP + OLAP)

Write goes to Postgres; reads come from ClickHouse. Module code uses two services with clear intent.

1
public override void RegisterServices(IServiceCollection services)
2
{
3
    // Write-side (Postgres) for transactional operations
4
    services.AddCephalonEntityFramework<OrdersDbContext>((sp, opts) =>
5
        opts.UseNpgsql(sp.GetRequiredService<IConfiguration>().GetConnectionString("OrdersWrite")));
6

7
    // Read-side (ClickHouse) for analytics queries
8
    services.AddCephalonClickHouse((sp, opts) =>
9
        opts.UseConnection(sp.GetRequiredService<IConfiguration>().GetConnectionString("Analytics")));
10

11
    services.AddScoped<IOrderWriter, EfOrderWriter>();
12
    services.AddScoped<IOrderAnalytics, ClickHouseOrderAnalytics>();
13
}

A separate background job (or CDC pipeline via Debezium) keeps ClickHouse fed from Postgres.

Pattern 3: cache-aside with Redis

Decorate a slow query with a Redis cache layer:

1
public sealed class CachedProductCatalog(
2
    EfProductCatalog inner,
3
    IRedisCache cache) : IProductCatalog
4
{
5
    public async Task<Product?> FindAsync(Sfid id, CancellationToken ct)
6
    {
7
        var key = $"products:{id}";
8
        if (await cache.GetAsync<Product>(key, ct) is { } cached) return cached;
9

10
        var product = await inner.FindAsync(id, ct);
11
        if (product is not null)
12
            await cache.SetAsync(key, product, TimeSpan.FromMinutes(10), ct);
13

14
        return product;
15
    }
16
}
17

18
// Register the decorator
19
services.AddScoped<EfProductCatalog>();
20
services.Decorate<IProductCatalog, CachedProductCatalog>();

Pattern 4: tenant-scoped DbContext

Inject the tenant into DbContextOptions so EF Core applies row-level filters automatically.

1
services.AddCephalonEntityFramework<AppDb>((sp, opts) =>
2
{
3
    var tenant = sp.GetRequiredService<ITenantContext>();
4
    var conn   = sp.GetRequiredService<ITenantConnectionResolver>().Resolve(tenant);
5
    opts.UseNpgsql(conn);
6
    opts.AddInterceptor(new TenantRlsInterceptor(tenant));   // injects tenant_id into every query
7
});

See Tutorial → Multi-tenant SaaS for the full pattern.

Pattern 5: outbox + eventing

EF Core writes domain rows + outbox rows in the same transaction; Wolverine drains the outbox.

1
public async Task<Order> PlaceOrderAsync(PlaceOrderInput input, CancellationToken ct)
2
{
3
    var order = new Order { /* … */ };
4
    db.Orders.Add(order);
5

6
    await publisher.PublishAsync(
7
        new OrderPlaced(order.Id, order.CustomerId, order.Total),
8
        ct);   // goes into outbox, not the bus directly
9

10
    await db.SaveChangesAsync(ct);  // single transaction: order row + outbox row
11
    return order;
12
}

The outbox guarantees at-least-once delivery even if the broker is down during commit. See Technology → Eventing.

Pattern 6: integration-testing with Testcontainers

1
public sealed class PostgresFixture : IAsyncLifetime
2
{
3
    public string ConnectionString { get; private set; } = string.Empty;
4
    private PostgreSqlContainer _container = null!;
5

6
    public async Task InitializeAsync()
7
    {
8
        _container = new PostgreSqlBuilder()
9
            .WithImage("postgres:16-alpine")
10
            .WithDatabase("acmestore")
11
            .Build();
12
        await _container.StartAsync();
13
        ConnectionString = _container.GetConnectionString();
14
    }
15

16
    public Task DisposeAsync() => _container.DisposeAsync().AsTask();
17
}

Full walkthrough: Tutorial → First-app step 7: Tests.

Per-backend notes

Postgres (`Cephalon.Data` + Npgsql)

Aspect	Detail
Min version	14+ (uses `INCLUDE` indexes, generated columns; older versions may work but aren’t gated by CI).
Driver	`Npgsql.EntityFrameworkCore.PostgreSQL` (community-maintained).
JSONB	EF Core supports `Property<JsonDocument>(…)` mapping.
Gotchas	Connection-pooling defaults are conservative; bump `Maximum Pool Size` for high concurrency. `Sfid` keys map to `char(26)` or `text` — index them. Migrations need explicit `AddColumn → CopyData → DropColumn` for non-nullable column adds on big tables.
Limits	`Cephalon.Data.Postgres` JSONB query helpers are `M2` — they work but may evolve.

SQL Server (`Cephalon.Data` + `Microsoft.Data.SqlClient`)

Aspect	Detail
Min version	2019+ for full feature set. Azure SQL fully supported.
Driver	`Microsoft.EntityFrameworkCore.SqlServer`.
Temporal tables	EF Core 8+ supports `entity.ToTable(b => b.IsTemporal())` — works with CephalonEngine.
Gotchas	Sfid stored as `nvarchar(26)` performs slightly worse than Postgres `char(26)`; use clustered key on Sfid only if access pattern is order-by-id. SQL Server’s `MERGE` has known concurrency bugs — prefer explicit `INSERT`/`UPDATE` paths.
Limits	Native row-level security needs SQL Server `2016+` or Azure SQL.

MySQL / MariaDB (`Cephalon.Data.MySql`)

Aspect	Detail
Min version	MySQL 8.0+, MariaDB 10.6+.
Driver	`MySqlConnector` (recommended) or Oracle’s `MySql.Data` (legacy).
JSON columns	Supported via `Property<JsonElement>(…)` mapping.
Gotchas	Default collation in MySQL 8 is `utf8mb4_0900_ai_ci` — case- and accent-insensitive. Use `utf8mb4_bin` for case-sensitive identifiers. Window functions require 8.0+.

MongoDB (`Cephalon.Data.MongoDb`)

Aspect	Detail
Min version	6.0+ for full transaction support.
Driver	`MongoDB.EntityFrameworkCore` (official, in preview) or raw `MongoDB.Driver`.
Pattern	Use Mongo for document-shaped data; relational data still belongs in Postgres / SQL Server.
Gotchas	EF Core integration is preview-quality. Raw driver is more stable. Sfid serializes as `string` — index it like any other id. Multi-document transactions require a replica set, not standalone Mongo.

Redis (`Cephalon.Data.Redis`)

Aspect	Detail
Min version	6.2+ (ACL support, sorted-set commands).
Driver	`StackExchange.Redis`.
Use for	Cache, distributed locks, rate limiting, pub/sub, leaderboards.
Not for	Long-term storage. Treat Redis as ephemeral.
Gotchas	`StackExchange.Redis` multiplexer should be `Singleton`-scoped, not per-request. Avoid `KEYS *` in production — it blocks. Use `SCAN`.

Cassandra (`Cephalon.Data.Cassandra`)

Aspect	Detail
Min version	4.0+.
Driver	`CassandraCSharpDriver`.
Use for	Distributed write-heavy workloads where eventual consistency is acceptable.
Gotchas	Schema design ≠ relational. Model around your queries, not your domain. No joins — denormalize aggressively. Tunable consistency (`ONE`, `QUORUM`, `ALL`) — pick per query, not globally.

ClickHouse (`Cephalon.Data.ClickHouse`)

Aspect	Detail
Min version	23.8+.
Driver	`ClickHouse.Client`.
Use for	Analytics, time-series, write-heavy logging.
Pattern	Pair with Postgres for OLTP write side; ClickHouse as the read-side analytics store.
Gotchas	Inserts are batch-oriented — accumulate before flushing; single-row inserts are slow. ClickHouse doesn’t have classic UPDATE semantics — use `ReplacingMergeTree` engine if you need updates.

Elasticsearch / OpenSearch (`Cephalon.Data.Elasticsearch`, `Cephalon.Data.OpenSearch`)

Aspect	Detail
Min version	Elasticsearch 8.x, OpenSearch 2.x.
Driver	`Elastic.Clients.Elasticsearch` for ES; `OpenSearch.Client` for OS.
Use for	Full-text search, faceted queries, log aggregation.
Pattern	Source of truth in Postgres / SQL Server; project into ES/OS via background indexer or CDC.
Gotchas	Schema (mapping) is mostly fixed — design carefully; reindexing is expensive. Refresh interval defaults to 1s; tune it (e.g. 30s) for write-heavy use.

Neo4j (`Cephalon.Data.Neo4j`)

Aspect	Detail
Min version	5.x.
Driver	`Neo4j.Driver`.
Use for	Relationship-heavy queries (recommendation, fraud, social, dependency analysis).
Pattern	Replicate identifiers from your relational DB; store only the graph relationships in Neo4j.

Qdrant (`Cephalon.Data.Qdrant`)

Aspect	Detail
Min version	1.7+.
Driver	`Qdrant.Client`.
Use for	Vector search, semantic retrieval (RAG), recommendation.
Pattern	Used by `Cephalon.Retrieval`. Pair with embeddings from OpenAI / Cohere / local models.
Gotchas	Vectors must match the embedding model’s dimension (e.g. 1536 for `text-embedding-3-small`). Use scalar quantization for large collections (>1M vectors) — saves memory at ~2% recall cost.

NATS / JetStream (`Cephalon.Data.Nats`)

Aspect	Detail
Min version	NATS 2.10+ with JetStream enabled.
Driver	`NATS.Client.Core` + `NATS.Client.JetStream`.
Use for	Persistent streams, KV store, eventing with replay.
Pattern	Can serve as the eventing transport (alternative to RabbitMQ / Kafka).

Debezium (`Cephalon.Data.Debezium`)

Aspect	Detail
Min version	Debezium 2.x.
Use for	Reliably capturing changes from existing databases into a stream (Kafka / Pulsar / NATS).
Pattern	Set up Debezium connector against Postgres logical replication slot → Kafka topic → Wolverine consumer.
Limits	Debezium runs outside the engine — `Cephalon.Data.Debezium` provides the consumer side for the events Debezium emits.

Migrations

EF Core migrations are the standard path:

# Add a migration
cd src/Acme.Store.Modules.Products
dotnet ef migrations add AddProducts --context ProductsDbContext --output-dir Data/Migrations

# Apply migrations (dev)
dotnet ef database update --context ProductsDbContext

In production, don’t apply migrations on startup. Use a dedicated migrator job:

dotnet ef migrations script --idempotent --output ./migrations/products-v1.2.0.sql

…then run that SQL through your normal deploy pipeline (Flyway, Liquibase, or a sidecar job in Kubernetes).

Limits & gotchas (cross-cutting)

Cephalon.Data itself doesn’t ship a connection pool. That’s the driver’s job; tune at the connection-string level.
Mixed Sfid and Guid columns in the same DbContext is not recommended — pick one strategy per DbContext.
The data-layer’s outbox is EF-specific. If you don’t use EF Core, you need to bring your own outbox table and the eventing layer’s IOutboxStore implementation.
No multi-master writes. The engine assumes a primary-write topology. Multi-region active-active needs Cassandra or a custom approach.
Tenant sharding (one DB per tenant) is supported via the ITenantConnectionResolver pattern but not via a built-in router — you wire it.
Cephalon.Data.* adapters are M2. They work for the documented happy paths; edge cases (cross-database transactions, exotic types) may not be covered. Track maturity per package in Reference → Maturity audit.

Tips & tricks

Practical data-layer guidance from production usage.

EF Core tips

Always AsNoTracking() for read-only queries. Tracking is expensive for projections; turn it off explicitly.
```
1
await db.Products.AsNoTracking().Where(p => p.Active).ToListAsync(ct);
```
Use IExecutionStrategy for transient retries on cloud DBs. Azure SQL / AWS RDS occasionally drop connections; EF Core’s strategy hides the retry from your handler code.
Project to DTOs in the database, not in memory. Select(p => new ProductDto(p.Id, p.Name)) runs as SQL; ToList().Select(...) fetches every column.
Cache compiled queries for hot paths. EF compiles + caches automatically, but dotnet ef dbcontext optimize generates a compiled model that’s faster still — worth it for high-QPS apps.
Use ChangeTracker.AutoDetectChangesEnabled = false inside bulk-import loops; toggle back on at the end.

Connection pooling

Set Maximum Pool Size on the connection string to a value that matches your concurrency. Default is 100 for Npgsql; bump if you see “timeout exhausted” errors.
Pool warmup at startup: open and immediately close a connection in OnStart. First request doesn’t pay the TLS-handshake tax.
Watch for connection leaks — every DbContext that’s new’d without Dispose holds a connection. Use DI scoping (AddDbContext) so the container handles lifetime.

Migrations in production

Never run dotnet ef database update in production. Generate idempotent SQL scripts: dotnet ef migrations script --idempotent and apply via your normal SQL deploy pipeline.
Separate “schema migration” from “data migration”. Schema migrations are reversible; data migrations often aren’t. Run them in different deploy windows.
Online schema changes: never ALTER TABLE ADD COLUMN NOT NULL on a big table directly. Three-step: add nullable column → backfill in batches → set NOT NULL.
Tag migrations with a Jira / GitHub issue ID in the migration name (AddProductsTable_AS-1234). Makes git blame meaningful 18 months later.

Naming

Item	Convention	Example
Table name	`snake_case_plural` (Postgres) or `PascalCasePlural` (SQL Server)	`products`, `Products`
Column name	`snake_case` (Postgres) or `PascalCase` (SQL Server)	`created_at`, `CreatedAt`
Index name	`ix_<table>_<col>` or `<Table>_<Col>_IX`	`ix_products_sku`
Foreign key	`fk_<table>_<referenced>`	`fk_orders_customers`
DbContext class	`<Module>DbContext`	`ProductsDbContext`

Backend-specific tricks

Postgres: use INCLUDE indexes for index-only scans. Postgres 11+ supports CREATE INDEX … ON … INCLUDE (col1, col2). Lets queries hit the index without touching the heap.
Postgres: enable pg_stat_statements extension; it surfaces slow queries automatically.
SQL Server: avoid nvarchar(MAX) for fields that rarely exceed 4000 chars — use nvarchar(4000) so the column is indexable.
MongoDB: index every field you query / sort by. Mongo defaults to full-collection scan if no index exists.
Redis: MEMORY USAGE <key> shows per-key memory. Useful when you suspect a key is bloating the cache.
ClickHouse: order columns by access pattern — ORDER BY (tenant_id, created_at) for tenant-scoped time-series queries.
Elasticsearch: use _source filtering on every search to avoid shipping the entire doc back over the network when you only need a few fields.

Cache patterns

Cache the slow query, not the whole entity. Smaller cache footprint, less churn.
TTL > “infinity + invalidation” for 90% of cases. Invalidation is hard. Short TTL with a fallback to source-of-truth handles most cases.
Stampede prevention: when many requests hit the cache miss simultaneously, only one should refresh. Use SemaphoreSlim keyed per cache-key, or Redis-based locks.
Cache-aside (read-through) is simpler than write-through. Stick with cache-aside unless you have a specific reason.

Eventing-data interaction

Outbox table needs an index on (processed_at, created_at) so the Wolverine drainer finds unprocessed messages quickly.
Outbox rows should be PURGED, not just marked processed. Set a cleanup job (or use ON DELETE rules) to keep the table small. A 100M-row outbox kills performance.
Inbox table for at-least-once dedup: index on (message_id, handler_name) with a TTL of ~30 days. Beyond TTL, dedup falls back to natural idempotency.

Performance heuristics

One DB round-trip per request, on the happy path. If a single endpoint hits the DB 5 times, that’s a refactor candidate.
Batch inserts — single inserts are ~10× slower than batched. Use AddRange + single SaveChanges.
Take(N) always, even on filtered queries. Defensive — a future bug that removes the filter shouldn’t OOM the host.
Profile before optimizing. Use dotnet-counters for runtime stats, dotnet-trace for hot paths. Optimising blind almost always picks the wrong thing.

Anti-patterns

Don’t	Do
Generic repository pattern (`IRepository<T>`)	Module-specific repository contracts (`IProductCatalog`) — domain-meaningful methods
Lazy-loading navigation properties	Explicit `Include(...)` or projected DTOs — predictable SQL
`SaveChanges` after every write	Group related writes in one transaction (one `SaveChanges` per unit-of-work)
Storing JSON blobs to “save schema work”	Use real columns or a typed JSONB approach — easier to query, index, migrate
Mixing read-side and write-side in the same DbContext for non-trivial apps	Separate them; the engine’s read/write split pattern is there for a reason
Time-zone arithmetic in app code	Store UTC, convert at the edge (browser / API caller)

Source-doc snapshots

The engine-side snapshots from the engine repo (autoritative when in doubt):

Where to go next

Tutorial → First-app step 3: Wire EF Core — complete EF Core walkthrough.
Technology → Identifiers — Sfid in depth.
Technology → Eventing — outbox + Wolverine integration.
Tutorial → Multi-tenant SaaS — per-tenant data sharding.
Reference → Architecture → Conformance matrix — which adapters expose which engine surfaces.