StatusTypeRelevance

ClickHouse Integration Implementation

This document describes the concrete implementation path for the ClickHouse integration.

Repositories

Two repositories form the implementation backbone:

Responsibilities are split cleanly:

  • clickhouse-config owns deployment, SQL migrations, tenant bootstrap, and credential generation
  • superset-tenant-operator owns Superset tenant lifecycle and datasource wiring

ClickHouse deployment

ClickHouse is deployed through a Helm chart.

Current characteristics:

  • namespace: operations
  • service ports:
    • HTTP 8123
    • native 9000
  • one persistent volume
  • one StatefulSet
  • one replica at the moment

This deployment is intentionally small and operationally simple. It is sufficient for the current integration path.

Migration repository layout

The migration layout is environment-based:

migrations/
  development/
    global/
    tenants.txt
    tenants/
      <slug>/
        bootstrap/
        schema/
  production/
    global/
    tenants.txt
    tenants/
      <slug>/
        bootstrap/
        schema/

Purpose of each part:

  • global/: shared ClickHouse operational objects
  • bootstrap/: tenant database and access setup
  • schema/: tenant-specific tables and schema changes
  • tenants.txt: authoritative activation list

The environment split is explicit. Development and production are managed independently.

Migration runner

The migration runner is the operational tool for ClickHouse schema rollout.

Main commands:

  • migrate all
  • migrate global
  • migrate tenant --tenant <slug>
  • migrate bootstrap-tenant --tenant <slug>
  • migrate generate-credentials --tenant <slug>
  • migrate check

Runtime behavior

The runner:

  • loads environment settings from run.sh
  • reads SQL files from the selected migration tree
  • applies global migrations
  • applies tenant migrations for active tenants
  • tracks checksums in ClickHouse
  • checks drift between Git-managed state and runtime state

Tracking tables:

  • ops.schema_migrations_global
  • ops.schema_migrations_tenant

Important behavior

  • already applied migrations are skipped if checksum matches
  • checksum mismatch is a hard failure
  • tenant rollout only covers tenants listed in tenants.txt
  • there is no automatic sync or hidden reconciliation step

Tenant bootstrap

Tenant bootstrap creates the minimum ClickHouse identity and database structure.

Bootstrap files:

  • V1000__create_tenant_database.sql
  • V1001__create_tenant_access.sql

Generated objects:

  • database: tenant_<slug>
  • role: role_tenant_<slug>
  • user: user_tenant_<slug>

Base grant set:

  • SELECT
  • INSERT
  • ALTER
  • CREATE
  • DROP

The bootstrap step is intentionally separate from schema definition. First the tenant exists, then the tenant schema can be rolled out.

Schema rollout

Schema files are normal SQL migrations under:

migrations/<environment>/tenants/<slug>/schema/

Typical workflow:

  1. add a new V*.sql file
  2. run dry-run
  3. apply the migration
  4. run migrate check

This keeps the change path narrow and observable.

Credential generation

Tenant credentials are managed through the migration tooling as well.

generate-credentials:

  • sets or rotates the ClickHouse password for user_tenant_<slug>
  • can emit a Kubernetes Secret manifest

The generated Secret contains:

  • database
  • username
  • password

The default Secret naming pattern is compatible with the Superset tenant operator.

Superset tenant contract

The superset-tenant-operator uses SupersetTenant resources to manage isolated Superset instances.

Relevant fields:

  • spec.organization
  • spec.supersetConfig.clickhouseEnabled
  • spec.supersetConfig.clickhouseConnectionName
  • spec.supersetConfig.clickhouseClientVersion

Operator-level defaults provide:

  • CLICKHOUSE_HOST
  • CLICKHOUSE_PORT
  • default datasource display name
  • default ClickHouse client package version

Secret contract between ClickHouse and Superset

For a tenant organization <org>, the operator expects:

  • Secret name: superset-tenant-<org>-clickhouse-credentials

Required keys:

  • database
  • username
  • password

The operator checks that:

  • the Secret exists
  • the required keys exist

If the Secret is missing or incomplete, the tenant is not ready from the operator point of view.

Superset datasource registration

When ClickHouse is enabled for a tenant:

  1. the operator injects the ClickHouse credentials into the Superset pod as environment variables
  2. the rendered superset_config.py builds the SQLAlchemy URI
  3. Superset upserts the ClickHouse datasource automatically

Runtime environment variables used by Superset:

  • CLICKHOUSE_DATABASE
  • CLICKHOUSE_USER
  • CLICKHOUSE_PASSWORD

Shared connection settings:

  • CLICKHOUSE_HOST
  • CLICKHOUSE_PORT

Result:

  • each Superset tenant gets one ClickHouse datasource pointing to its own tenant database

Drift checks

migrate check provides operational verification in three areas:

  • migration tracking
  • access state
  • managed tables

This is enough for the current architecture because it answers the important operational questions:

  • were the expected migrations applied
  • does the tenant database and user exist
  • are the expected managed tables present

Standard onboarding sequence

  1. deploy ClickHouse
  2. bootstrap the tenant in the target environment
  3. add tenant schema SQL
  4. run migrate all --dry-run
  5. apply migrations
  6. run migrate check
  7. generate the tenant ClickHouse Secret
  8. apply the Secret for Superset
  9. create or update the SupersetTenant
  10. verify tenant health and datasource availability

Current and future scope

The implementation is designed so that deployment complexity can grow later without changing the integration contract.

Possible future work:

  • backup automation
  • replication
  • larger storage and recovery setup
  • more detailed alerting

These are deployment extensions. They do not require a different tenant model or a different Superset credential contract.