ClickHouse Integration Implementation
This document describes the concrete implementation path for the ClickHouse integration.
Repositories
Two repositories form the implementation backbone:
clickhouse-config— https://dev.azure.com/efscis/EFS-SDK/_git/clickhouse-configsuperset-tenant-operator— https://dev.azure.com/efscis/EFS-SDK/_git/superset-tenant-operator
Responsibilities are split cleanly:
clickhouse-configowns deployment, SQL migrations, tenant bootstrap, and credential generationsuperset-tenant-operatorowns Superset tenant lifecycle and datasource wiring
ClickHouse deployment
ClickHouse is deployed through a Helm chart.
Current characteristics:
- namespace:
operations - service ports:
- HTTP
8123 - native
9000
- HTTP
- one persistent volume
- one StatefulSet
- one replica at the moment
This deployment is intentionally small and operationally simple. It is sufficient for the current integration path.
Migration repository layout
The migration layout is environment-based:
migrations/
development/
global/
tenants.txt
tenants/
<slug>/
bootstrap/
schema/
production/
global/
tenants.txt
tenants/
<slug>/
bootstrap/
schema/Purpose of each part:
global/: shared ClickHouse operational objectsbootstrap/: tenant database and access setupschema/: tenant-specific tables and schema changestenants.txt: authoritative activation list
The environment split is explicit. Development and production are managed independently.
Migration runner
The migration runner is the operational tool for ClickHouse schema rollout.
Main commands:
migrate allmigrate globalmigrate tenant --tenant <slug>migrate bootstrap-tenant --tenant <slug>migrate generate-credentials --tenant <slug>migrate check
Runtime behavior
The runner:
- loads environment settings from
run.sh - reads SQL files from the selected migration tree
- applies global migrations
- applies tenant migrations for active tenants
- tracks checksums in ClickHouse
- checks drift between Git-managed state and runtime state
Tracking tables:
ops.schema_migrations_globalops.schema_migrations_tenant
Important behavior
- already applied migrations are skipped if checksum matches
- checksum mismatch is a hard failure
- tenant rollout only covers tenants listed in
tenants.txt - there is no automatic sync or hidden reconciliation step
Tenant bootstrap
Tenant bootstrap creates the minimum ClickHouse identity and database structure.
Bootstrap files:
V1000__create_tenant_database.sqlV1001__create_tenant_access.sql
Generated objects:
- database:
tenant_<slug> - role:
role_tenant_<slug> - user:
user_tenant_<slug>
Base grant set:
SELECTINSERTALTERCREATEDROP
The bootstrap step is intentionally separate from schema definition. First the tenant exists, then the tenant schema can be rolled out.
Schema rollout
Schema files are normal SQL migrations under:
migrations/<environment>/tenants/<slug>/schema/Typical workflow:
- add a new
V*.sqlfile - run dry-run
- apply the migration
- run
migrate check
This keeps the change path narrow and observable.
Credential generation
Tenant credentials are managed through the migration tooling as well.
generate-credentials:
- sets or rotates the ClickHouse password for
user_tenant_<slug> - can emit a Kubernetes Secret manifest
The generated Secret contains:
databaseusernamepassword
The default Secret naming pattern is compatible with the Superset tenant operator.
Superset tenant contract
The superset-tenant-operator uses SupersetTenant resources to manage isolated Superset instances.
Relevant fields:
spec.organizationspec.supersetConfig.clickhouseEnabledspec.supersetConfig.clickhouseConnectionNamespec.supersetConfig.clickhouseClientVersion
Operator-level defaults provide:
CLICKHOUSE_HOSTCLICKHOUSE_PORT- default datasource display name
- default ClickHouse client package version
Secret contract between ClickHouse and Superset
For a tenant organization <org>, the operator expects:
- Secret name:
superset-tenant-<org>-clickhouse-credentials
Required keys:
databaseusernamepassword
The operator checks that:
- the Secret exists
- the required keys exist
If the Secret is missing or incomplete, the tenant is not ready from the operator point of view.
Superset datasource registration
When ClickHouse is enabled for a tenant:
- the operator injects the ClickHouse credentials into the Superset pod as environment variables
- the rendered
superset_config.pybuilds the SQLAlchemy URI - Superset upserts the ClickHouse datasource automatically
Runtime environment variables used by Superset:
CLICKHOUSE_DATABASECLICKHOUSE_USERCLICKHOUSE_PASSWORD
Shared connection settings:
CLICKHOUSE_HOSTCLICKHOUSE_PORT
Result:
- each Superset tenant gets one ClickHouse datasource pointing to its own tenant database
Drift checks
migrate check provides operational verification in three areas:
- migration tracking
- access state
- managed tables
This is enough for the current architecture because it answers the important operational questions:
- were the expected migrations applied
- does the tenant database and user exist
- are the expected managed tables present
Standard onboarding sequence
- deploy ClickHouse
- bootstrap the tenant in the target environment
- add tenant schema SQL
- run
migrate all --dry-run - apply migrations
- run
migrate check - generate the tenant ClickHouse Secret
- apply the Secret for Superset
- create or update the
SupersetTenant - verify tenant health and datasource availability
Current and future scope
The implementation is designed so that deployment complexity can grow later without changing the integration contract.
Possible future work:
- backup automation
- replication
- larger storage and recovery setup
- more detailed alerting
These are deployment extensions. They do not require a different tenant model or a different Superset credential contract.