Skip to content

Data

Cephalon.Data is the runtime-neutral data layer for CephalonEngine apps. It defines read/write store contracts, command/query separation, and integration points; concrete adapters bring specific backends.

This page is the decision guide + recipe book for choosing and using a data adapter. Each backend has a “when to choose”, “how to enable”, an end-to-end example, and known limits.

PackageMaturityWhat it brings
Cephalon.DataM3Runtime-neutral data abstractions. Read/write store contracts, command/query split, outbox interface.
Cephalon.Data.EntityFrameworkM3EF Core integration — DbContext baseline, inbox/outbox storage, Sfid.EntityFramework value converter.
Cephalon.Data.PostgresM2Postgres-specific helpers via Npgsql (JSONB columns, hstore, listen/notify).
Cephalon.Data.SqlServerM2SQL Server / Azure SQL helpers (rowversion, temporal tables).
Cephalon.Data.MySqlM2MySQL / MariaDB adapter (uses MySqlConnector).
Cephalon.Data.OracleM2Oracle Database adapter (uses Oracle.ManagedDataAccess).
Cephalon.Data.MongoDbM2MongoDB adapter for document workloads.
Cephalon.Data.CassandraM2Cassandra adapter (4.x+).
Cephalon.Data.ClickHouseM2ClickHouse analytics-database adapter.
Cephalon.Data.ElasticsearchM2Elasticsearch 8.x adapter.
Cephalon.Data.OpenSearchM2OpenSearch 2.x adapter.
Cephalon.Data.RedisM2Redis adapter (cache, sets, sorted sets, streams).
Cephalon.Data.Neo4jM2Neo4j graph adapter.
Cephalon.Data.QdrantM2Qdrant vector-database adapter (used by Cephalon.Retrieval).
Cephalon.Data.NatsM2NATS JetStream stream / KV adapter.
Cephalon.Data.DebeziumM2Debezium CDC source adapter.

Maturity reference: Reference → Architecture → Maturity audit.

The default flow uses EF Core + Postgres. Set in two places: Program.cs and appsettings.json.

src/Acme.Store.Host/Program.cs
builder.Services
.AddCephalonAspNetCore()
.AddData(options =>
{
options.UseEntityFramework(); // engine-wide EF default
options.UsePostgres(builder.Configuration.GetConnectionString("Default")!);
options.IdStrategy = IdStrategy.Sfid; // Sfid (default), Guid, or Long
})
.AddModulesFromAssemblies(/* ... */);
appsettings.json
{
"Engine": {
"Data": {
"IdStrategy": "Sfid",
"Provider": "Postgres",
"ReadModel": { "Provider": "Postgres" },
"WriteModel": { "Provider": "Postgres" }
}
},
"ConnectionStrings": {
"Default": "Host=localhost;Port=5432;Database=acmestore;Username=postgres;Password=postgres"
}
}
WorkloadRecommendedWhy
OLTP, relational with strong consistencyPostgres / SQL ServerACID, mature EF Core support, broad tooling
Multi-tenant SaaS with row-level isolationPostgres + RLSPostgres row-level security beats application-only enforcement
Document-shaped data (variable schemas)MongoDBNative JSON, flexible indexes, change streams
Analytics / reporting / OLAPClickHouseColumnar storage, fast aggregations
Full-text searchElasticsearch / OpenSearchInverted indexes, scoring, faceted search
Cache layerRedisSub-ms latency, pub/sub, atomic ops
Time-series (metrics, IoT)ClickHouse + TimescaleDBBoth work; ClickHouse for write-heavy, Postgres+Timescale for SQL access
Graph relationshipsNeo4jNative graph queries (Cypher)
Vector search / embeddingsQdrantPurpose-built for similarity search
Distributed write-heavyCassandraTunable consistency, multi-region
Event sourcingPostgres + EF Core outbox + Kafka/NATSAggregate-root pattern + reliable delivery
CDC source for downstreamDebezium + Postgres / SQL ServerBattle-tested change capture

For multi-backend apps (write-side OLTP + read-side OLAP), use the read/write split pattern below.

src/Acme.Store.Modules.Products/Data/ProductsDbContext.cs
using Cephalon.Data.EntityFramework;
using Microsoft.EntityFrameworkCore;
public sealed class ProductsDbContext(DbContextOptions<ProductsDbContext> options)
: CephalonDbContext(options)
{
public DbSet<Product> Products => Set<Product>();
protected override void OnModelCreating(ModelBuilder b)
{
base.OnModelCreating(b);
b.Entity<Product>(e =>
{
e.ToTable("products");
e.HasKey(p => p.Id);
e.Property(p => p.Name).HasMaxLength(200).IsRequired();
e.Property(p => p.Sku).HasMaxLength(64).IsRequired();
e.HasIndex(p => p.Sku).IsUnique();
e.Property(p => p.Price).HasColumnType("numeric(12,2)");
});
}
}
src/Acme.Store.Modules.Products/ProductsModule.cs
public sealed class ProductsModule : RestBehaviorModuleBase
{
public override ModuleDescriptor Describe() => new(
"Acme.Store.Modules.Products", "1.0.0", [Capability.Data]);
public override void RegisterServices(IServiceCollection services)
{
services.AddCephalonEntityFramework<ProductsDbContext>((sp, opts) =>
{
var conn = sp.GetRequiredService<IConfiguration>().GetConnectionString("Products");
opts.UseNpgsql(conn);
});
services.AddScoped<IProductCatalog, EfProductCatalog>();
}
/* ... */
}

Full walkthrough: Tutorial → First-app step 3: Wire EF Core.

Write goes to Postgres; reads come from ClickHouse. Module code uses two services with clear intent.

src/Acme.Store.Modules.Reports/ReportsModule.cs
public override void RegisterServices(IServiceCollection services)
{
// Write-side (Postgres) for transactional operations
services.AddCephalonEntityFramework<OrdersDbContext>((sp, opts) =>
opts.UseNpgsql(sp.GetRequiredService<IConfiguration>().GetConnectionString("OrdersWrite")));
// Read-side (ClickHouse) for analytics queries
services.AddCephalonClickHouse((sp, opts) =>
opts.UseConnection(sp.GetRequiredService<IConfiguration>().GetConnectionString("Analytics")));
services.AddScoped<IOrderWriter, EfOrderWriter>();
services.AddScoped<IOrderAnalytics, ClickHouseOrderAnalytics>();
}

A separate background job (or CDC pipeline via Debezium) keeps ClickHouse fed from Postgres.

Decorate a slow query with a Redis cache layer:

CachedProductCatalog.cs
public sealed class CachedProductCatalog(
EfProductCatalog inner,
IRedisCache cache) : IProductCatalog
{
public async Task<Product?> FindAsync(Sfid id, CancellationToken ct)
{
var key = $"products:{id}";
if (await cache.GetAsync<Product>(key, ct) is { } cached) return cached;
var product = await inner.FindAsync(id, ct);
if (product is not null)
await cache.SetAsync(key, product, TimeSpan.FromMinutes(10), ct);
return product;
}
}
// Register the decorator
services.AddScoped<EfProductCatalog>();
services.Decorate<IProductCatalog, CachedProductCatalog>();

Inject the tenant into DbContextOptions so EF Core applies row-level filters automatically.

services.AddCephalonEntityFramework<AppDb>((sp, opts) =>
{
var tenant = sp.GetRequiredService<ITenantContext>();
var conn = sp.GetRequiredService<ITenantConnectionResolver>().Resolve(tenant);
opts.UseNpgsql(conn);
opts.AddInterceptor(new TenantRlsInterceptor(tenant)); // injects tenant_id into every query
});

See Tutorial → Multi-tenant SaaS for the full pattern.

EF Core writes domain rows + outbox rows in the same transaction; Wolverine drains the outbox.

public async Task<Order> PlaceOrderAsync(PlaceOrderInput input, CancellationToken ct)
{
var order = new Order { /* … */ };
db.Orders.Add(order);
await publisher.PublishAsync(
new OrderPlaced(order.Id, order.CustomerId, order.Total),
ct); // goes into outbox, not the bus directly
await db.SaveChangesAsync(ct); // single transaction: order row + outbox row
return order;
}

The outbox guarantees at-least-once delivery even if the broker is down during commit. See Technology → Eventing.

Pattern 6: integration-testing with Testcontainers

Section titled “Pattern 6: integration-testing with Testcontainers”
public sealed class PostgresFixture : IAsyncLifetime
{
public string ConnectionString { get; private set; } = string.Empty;
private PostgreSqlContainer _container = null!;
public async Task InitializeAsync()
{
_container = new PostgreSqlBuilder()
.WithImage("postgres:16-alpine")
.WithDatabase("acmestore")
.Build();
await _container.StartAsync();
ConnectionString = _container.GetConnectionString();
}
public Task DisposeAsync() => _container.DisposeAsync().AsTask();
}

Full walkthrough: Tutorial → First-app step 7: Tests.

AspectDetail
Min version14+ (uses INCLUDE indexes, generated columns; older versions may work but aren’t gated by CI).
DriverNpgsql.EntityFrameworkCore.PostgreSQL (community-maintained).
JSONBEF Core supports Property<JsonDocument>(…) mapping.
Gotchas
  • Connection-pooling defaults are conservative; bump Maximum Pool Size for high concurrency.
  • Sfid keys map to char(26) or text — index them.
  • Migrations need explicit AddColumn → CopyData → DropColumn for non-nullable column adds on big tables.
LimitsCephalon.Data.Postgres JSONB query helpers are M2 — they work but may evolve.

SQL Server (Cephalon.Data + Microsoft.Data.SqlClient)

Section titled “SQL Server (Cephalon.Data + Microsoft.Data.SqlClient)”
AspectDetail
Min version2019+ for full feature set. Azure SQL fully supported.
DriverMicrosoft.EntityFrameworkCore.SqlServer.
Temporal tablesEF Core 8+ supports entity.ToTable(b => b.IsTemporal()) — works with CephalonEngine.
Gotchas
  • Sfid stored as nvarchar(26) performs slightly worse than Postgres char(26); use clustered key on Sfid only if access pattern is order-by-id.
  • SQL Server’s MERGE has known concurrency bugs — prefer explicit INSERT/UPDATE paths.
LimitsNative row-level security needs SQL Server 2016+ or Azure SQL.
AspectDetail
Min versionMySQL 8.0+, MariaDB 10.6+.
DriverMySqlConnector (recommended) or Oracle’s MySql.Data (legacy).
JSON columnsSupported via Property<JsonElement>(…) mapping.
Gotchas
  • Default collation in MySQL 8 is utf8mb4_0900_ai_ci — case- and accent-insensitive. Use utf8mb4_bin for case-sensitive identifiers.
  • Window functions require 8.0+.
AspectDetail
Min version6.0+ for full transaction support.
DriverMongoDB.EntityFrameworkCore (official, in preview) or raw MongoDB.Driver.
PatternUse Mongo for document-shaped data; relational data still belongs in Postgres / SQL Server.
Gotchas
  • EF Core integration is preview-quality. Raw driver is more stable.
  • Sfid serializes as string — index it like any other id.
  • Multi-document transactions require a replica set, not standalone Mongo.
AspectDetail
Min version6.2+ (ACL support, sorted-set commands).
DriverStackExchange.Redis.
Use forCache, distributed locks, rate limiting, pub/sub, leaderboards.
Not forLong-term storage. Treat Redis as ephemeral.
Gotchas
  • StackExchange.Redis multiplexer should be Singleton-scoped, not per-request.
  • Avoid KEYS * in production — it blocks. Use SCAN.
AspectDetail
Min version4.0+.
DriverCassandraCSharpDriver.
Use forDistributed write-heavy workloads where eventual consistency is acceptable.
Gotchas
  • Schema design ≠ relational. Model around your queries, not your domain.
  • No joins — denormalize aggressively.
  • Tunable consistency (ONE, QUORUM, ALL) — pick per query, not globally.
AspectDetail
Min version23.8+.
DriverClickHouse.Client.
Use forAnalytics, time-series, write-heavy logging.
PatternPair with Postgres for OLTP write side; ClickHouse as the read-side analytics store.
Gotchas
  • Inserts are batch-oriented — accumulate before flushing; single-row inserts are slow.
  • ClickHouse doesn’t have classic UPDATE semantics — use ReplacingMergeTree engine if you need updates.

Elasticsearch / OpenSearch (Cephalon.Data.Elasticsearch, Cephalon.Data.OpenSearch)

Section titled “Elasticsearch / OpenSearch (Cephalon.Data.Elasticsearch, Cephalon.Data.OpenSearch)”
AspectDetail
Min versionElasticsearch 8.x, OpenSearch 2.x.
DriverElastic.Clients.Elasticsearch for ES; OpenSearch.Client for OS.
Use forFull-text search, faceted queries, log aggregation.
PatternSource of truth in Postgres / SQL Server; project into ES/OS via background indexer or CDC.
Gotchas
  • Schema (mapping) is mostly fixed — design carefully; reindexing is expensive.
  • Refresh interval defaults to 1s; tune it (e.g. 30s) for write-heavy use.
AspectDetail
Min version5.x.
DriverNeo4j.Driver.
Use forRelationship-heavy queries (recommendation, fraud, social, dependency analysis).
PatternReplicate identifiers from your relational DB; store only the graph relationships in Neo4j.
AspectDetail
Min version1.7+.
DriverQdrant.Client.
Use forVector search, semantic retrieval (RAG), recommendation.
PatternUsed by Cephalon.Retrieval. Pair with embeddings from OpenAI / Cohere / local models.
Gotchas
  • Vectors must match the embedding model’s dimension (e.g. 1536 for text-embedding-3-small).
  • Use scalar quantization for large collections (>1M vectors) — saves memory at ~2% recall cost.
AspectDetail
Min versionNATS 2.10+ with JetStream enabled.
DriverNATS.Client.Core + NATS.Client.JetStream.
Use forPersistent streams, KV store, eventing with replay.
PatternCan serve as the eventing transport (alternative to RabbitMQ / Kafka).
AspectDetail
Min versionDebezium 2.x.
Use forReliably capturing changes from existing databases into a stream (Kafka / Pulsar / NATS).
PatternSet up Debezium connector against Postgres logical replication slot → Kafka topic → Wolverine consumer.
LimitsDebezium runs outside the engine — Cephalon.Data.Debezium provides the consumer side for the events Debezium emits.

EF Core migrations are the standard path:

Terminal window
# Add a migration
cd src/Acme.Store.Modules.Products
dotnet ef migrations add AddProducts --context ProductsDbContext --output-dir Data/Migrations
# Apply migrations (dev)
dotnet ef database update --context ProductsDbContext

In production, don’t apply migrations on startup. Use a dedicated migrator job:

Terminal window
dotnet ef migrations script --idempotent --output ./migrations/products-v1.2.0.sql

…then run that SQL through your normal deploy pipeline (Flyway, Liquibase, or a sidecar job in Kubernetes).

  • Cephalon.Data itself doesn’t ship a connection pool. That’s the driver’s job; tune at the connection-string level.
  • Mixed Sfid and Guid columns in the same DbContext is not recommended — pick one strategy per DbContext.
  • The data-layer’s outbox is EF-specific. If you don’t use EF Core, you need to bring your own outbox table and the eventing layer’s IOutboxStore implementation.
  • No multi-master writes. The engine assumes a primary-write topology. Multi-region active-active needs Cassandra or a custom approach.
  • Tenant sharding (one DB per tenant) is supported via the ITenantConnectionResolver pattern but not via a built-in router — you wire it.
  • Cephalon.Data.* adapters are M2. They work for the documented happy paths; edge cases (cross-database transactions, exotic types) may not be covered. Track maturity per package in Reference → Maturity audit.

Practical data-layer guidance from production usage.

  • Always AsNoTracking() for read-only queries. Tracking is expensive for projections; turn it off explicitly.
    await db.Products.AsNoTracking().Where(p => p.Active).ToListAsync(ct);
  • Use IExecutionStrategy for transient retries on cloud DBs. Azure SQL / AWS RDS occasionally drop connections; EF Core’s strategy hides the retry from your handler code.
  • Project to DTOs in the database, not in memory. Select(p => new ProductDto(p.Id, p.Name)) runs as SQL; ToList().Select(...) fetches every column.
  • Cache compiled queries for hot paths. EF compiles + caches automatically, but dotnet ef dbcontext optimize generates a compiled model that’s faster still — worth it for high-QPS apps.
  • Use ChangeTracker.AutoDetectChangesEnabled = false inside bulk-import loops; toggle back on at the end.
  • Set Maximum Pool Size on the connection string to a value that matches your concurrency. Default is 100 for Npgsql; bump if you see “timeout exhausted” errors.
  • Pool warmup at startup: open and immediately close a connection in OnStart. First request doesn’t pay the TLS-handshake tax.
  • Watch for connection leaks — every DbContext that’s new’d without Dispose holds a connection. Use DI scoping (AddDbContext) so the container handles lifetime.
  • Never run dotnet ef database update in production. Generate idempotent SQL scripts: dotnet ef migrations script --idempotent and apply via your normal SQL deploy pipeline.
  • Separate “schema migration” from “data migration”. Schema migrations are reversible; data migrations often aren’t. Run them in different deploy windows.
  • Online schema changes: never ALTER TABLE ADD COLUMN NOT NULL on a big table directly. Three-step: add nullable column → backfill in batches → set NOT NULL.
  • Tag migrations with a Jira / GitHub issue ID in the migration name (AddProductsTable_AS-1234). Makes git blame meaningful 18 months later.
ItemConventionExample
Table namesnake_case_plural (Postgres) or PascalCasePlural (SQL Server)products, Products
Column namesnake_case (Postgres) or PascalCase (SQL Server)created_at, CreatedAt
Index nameix_<table>_<col> or <Table>_<Col>_IXix_products_sku
Foreign keyfk_<table>_<referenced>fk_orders_customers
DbContext class<Module>DbContextProductsDbContext
  • Postgres: use INCLUDE indexes for index-only scans. Postgres 11+ supports CREATE INDEX … ON … INCLUDE (col1, col2). Lets queries hit the index without touching the heap.
  • Postgres: enable pg_stat_statements extension; it surfaces slow queries automatically.
  • SQL Server: avoid nvarchar(MAX) for fields that rarely exceed 4000 chars — use nvarchar(4000) so the column is indexable.
  • MongoDB: index every field you query / sort by. Mongo defaults to full-collection scan if no index exists.
  • Redis: MEMORY USAGE <key> shows per-key memory. Useful when you suspect a key is bloating the cache.
  • ClickHouse: order columns by access pattern — ORDER BY (tenant_id, created_at) for tenant-scoped time-series queries.
  • Elasticsearch: use _source filtering on every search to avoid shipping the entire doc back over the network when you only need a few fields.
  • Cache the slow query, not the whole entity. Smaller cache footprint, less churn.
  • TTL > “infinity + invalidation” for 90% of cases. Invalidation is hard. Short TTL with a fallback to source-of-truth handles most cases.
  • Stampede prevention: when many requests hit the cache miss simultaneously, only one should refresh. Use SemaphoreSlim keyed per cache-key, or Redis-based locks.
  • Cache-aside (read-through) is simpler than write-through. Stick with cache-aside unless you have a specific reason.
  • Outbox table needs an index on (processed_at, created_at) so the Wolverine drainer finds unprocessed messages quickly.
  • Outbox rows should be PURGED, not just marked processed. Set a cleanup job (or use ON DELETE rules) to keep the table small. A 100M-row outbox kills performance.
  • Inbox table for at-least-once dedup: index on (message_id, handler_name) with a TTL of ~30 days. Beyond TTL, dedup falls back to natural idempotency.
  • One DB round-trip per request, on the happy path. If a single endpoint hits the DB 5 times, that’s a refactor candidate.
  • Batch inserts — single inserts are ~10× slower than batched. Use AddRange + single SaveChanges.
  • Take(N) always, even on filtered queries. Defensive — a future bug that removes the filter shouldn’t OOM the host.
  • Profile before optimizing. Use dotnet-counters for runtime stats, dotnet-trace for hot paths. Optimising blind almost always picks the wrong thing.
Don’tDo
Generic repository pattern (IRepository<T>)Module-specific repository contracts (IProductCatalog) — domain-meaningful methods
Lazy-loading navigation propertiesExplicit Include(...) or projected DTOs — predictable SQL
SaveChanges after every writeGroup related writes in one transaction (one SaveChanges per unit-of-work)
Storing JSON blobs to “save schema work”Use real columns or a typed JSONB approach — easier to query, index, migrate
Mixing read-side and write-side in the same DbContext for non-trivial appsSeparate them; the engine’s read/write split pattern is there for a reason
Time-zone arithmetic in app codeStore UTC, convert at the edge (browser / API caller)

The engine-side snapshots from the engine repo (autoritative when in doubt):