StatusTypeRelevance

ClickHouse Integration

This document describes a simple and stable way to integrate ClickHouse into the platform as the main analytics database for Superset.

The concept follows one rule:

Keep schema, rollout, and tenant integration explicit. Use plain SQL, fixed contracts, and as few moving parts as possible.

Goal

  • ClickHouse provides the analytical database for tenant-specific reporting and exploration.
  • Superset connects directly to ClickHouse.
  • Each tenant works in its own isolated ClickHouse database.
  • Schema is managed in Git as executable SQL.
  • Data loading is separate from schema rollout.

Core idea

The platform does not need a modeling layer for ClickHouse.

The stable path is:

  1. deploy ClickHouse
  2. create one database per tenant
  3. manage schema as SQL migrations in Git
  4. create tenant credentials
  5. expose those credentials to Superset through a fixed Secret contract
  6. let ingestion workflows load data into the existing tables

This keeps the system understandable for operators and developers:

  • the SQL in Git is the schema
  • the migration runner applies that SQL
  • Superset only needs working tenant credentials

Architecture

Git
  ├─ ClickHouse deployment manifests
  ├─ global SQL migrations
  ├─ tenant bootstrap SQL
  ├─ tenant schema SQL
  └─ tenant credential manifests
 
Runtime
  ├─ one ClickHouse deployment
  ├─ one ClickHouse database per tenant
  ├─ one ClickHouse user per tenant
  ├─ one Superset tenant per organization
  └─ one Superset datasource per tenant pointing to its ClickHouse database
 
Data flow
  ├─ data lands in curated storage
  ├─ loader jobs insert data into ClickHouse tables
  └─ Superset reads from ClickHouse

Principles

  • SQL-first: schema is written directly as SQL.
  • Git-first: all managed schema changes are reviewed in Git.
  • Tenant isolation first: every tenant gets its own database and credentials.
  • Explicit rollout: schema changes are applied deliberately, not by hidden background automation.
  • Small contracts: Superset integration depends only on a small credential Secret.
  • Separate responsibilities: deployment, schema rollout, Superset lifecycle, and ingestion are independent workflows.

ClickHouse deployment

ClickHouse runs as a Kubernetes workload and is treated as a shared analytics backend.

Current expectations:

  • one Kubernetes deployment context for ClickHouse
  • persistent storage
  • stable in-cluster service name
  • HTTP access for Superset
  • native access for operational tooling

The current deployment may stay small for now. It can be expanded later without changing the tenant contract.

Tenant model

Each tenant gets:

  • one database: tenant_<slug>
  • one role: role_tenant_<slug>
  • one user: user_tenant_<slug>

The tenant user is used by Superset for that tenant.

This keeps the security model simple:

  • Superset tenant A only sees database A
  • Superset tenant B only sees database B
  • there is no shared base database in the normal path

Schema management

Schema is managed directly as versioned SQL migrations.

Migration structure:

migrations/
  <environment>/
    global/
    tenants/
      <tenant>/
        bootstrap/
        schema/
    tenants.txt

Meaning:

  • global/ contains shared operational SQL
  • bootstrap/ contains tenant database and access setup
  • schema/ contains tenant table definitions and later changes
  • tenants.txt defines which tenants are active for rollout

Rules:

  • the SQL file is the contract
  • the database name is written explicitly in tenant SQL
  • schema changes are additive and versioned
  • rollout happens through a migration command, not through ad-hoc manual edits in ClickHouse

Table design

This concept does not enforce a special naming model beyond clarity and stability.

Tables should be:

  • domain-specific
  • easy to understand
  • stable enough for dashboards and queries

Good examples are simple table names such as:

  • batterydata
  • reservationdata_device
  • reservationdata_reservations

The concept does not require a separate space dimension in the schema contract.

Data loading

Data loading is outside the schema mechanism.

That means:

  • schema migrations create and change tables
  • loader jobs insert data into those tables
  • loaders can be implemented independently of the migration runner

Typical sources:

  • Parquet files in curated storage
  • batch exports
  • domain-specific processing pipelines

This separation is useful because schema rollout and data ingestion have different failure modes and different release cycles.

Superset integration

Superset is integrated per tenant.

Each SupersetTenant can enable a ClickHouse datasource.

The runtime contract is:

  • ClickHouse host and port are platform-level settings
  • the tenant-specific database name, username, and password come from a Kubernetes Secret
  • the operator injects these values into the Superset pod
  • Superset registers or updates the ClickHouse datasource automatically

Required Secret keys:

  • database
  • username
  • password

This is enough to make the Superset side simple and repeatable.

Operational workflow

The standard path for a new tenant is:

  1. create tenant bootstrap SQL
  2. add tenant schema SQL
  3. activate the tenant in the environment
  4. run migration dry-run
  5. apply migrations
  6. generate tenant credentials
  7. publish the tenant credential Secret for Superset
  8. create or update the SupersetTenant
  9. verify datasource readiness

The standard path for a schema change is:

  1. add a new SQL migration file
  2. review it in Git
  3. run dry-run
  4. apply it
  5. run state checks
  6. verify dashboards and queries

What stays stable over time

Even if the ClickHouse deployment grows later, these contracts should remain stable:

  • tenant database naming
  • tenant user naming
  • migration directory layout
  • explicit SQL migrations
  • Secret keys for Superset integration

That allows the platform to improve deployment topology later without redesigning the integration model.

Summary

ClickHouse integration is simple when the contract stays small:

  • ClickHouse is the analytics database
  • Superset connects directly to it
  • every tenant gets its own database and credentials
  • schema is managed as SQL in Git
  • migrations are applied explicitly
  • ingestion stays separate from schema rollout

This is not the most abstract solution. It is the one that is easiest to understand, operate, and extend safely.