ClickHouse Integration

This document describes a simple and stable way to integrate ClickHouse into the platform as the main analytics database for Superset.

The concept follows one rule:

Keep schema, rollout, and tenant integration explicit. Use plain SQL, fixed contracts, and as few moving parts as possible.

Goal

ClickHouse provides the analytical database for tenant-specific reporting and exploration.
Superset connects directly to ClickHouse.
Each tenant works in its own isolated ClickHouse database.
Schema is managed in Git as executable SQL.
Data loading is separate from schema rollout.

Core idea

The platform does not need a modeling layer for ClickHouse.

The stable path is:

deploy ClickHouse
create one database per tenant
manage schema as SQL migrations in Git
create tenant credentials
expose those credentials to Superset through a fixed Secret contract
let ingestion workflows load data into the existing tables

This keeps the system understandable for operators and developers:

the SQL in Git is the schema
the migration runner applies that SQL
Superset only needs working tenant credentials

Architecture

Git
  ├─ ClickHouse deployment manifests
  ├─ global SQL migrations
  ├─ tenant bootstrap SQL
  ├─ tenant schema SQL
  └─ tenant credential manifests
 
Runtime
  ├─ one ClickHouse deployment
  ├─ one ClickHouse database per tenant
  ├─ one ClickHouse user per tenant
  ├─ one Superset tenant per organization
  └─ one Superset datasource per tenant pointing to its ClickHouse database
 
Data flow
  ├─ data lands in curated storage
  ├─ loader jobs insert data into ClickHouse tables
  └─ Superset reads from ClickHouse

Principles

SQL-first: schema is written directly as SQL.
Git-first: all managed schema changes are reviewed in Git.
Tenant isolation first: every tenant gets its own database and credentials.
Explicit rollout: schema changes are applied deliberately, not by hidden background automation.
Small contracts: Superset integration depends only on a small credential Secret.
Separate responsibilities: deployment, schema rollout, Superset lifecycle, and ingestion are independent workflows.

ClickHouse deployment

ClickHouse runs as a Kubernetes workload and is treated as a shared analytics backend.

Current expectations:

one Kubernetes deployment context for ClickHouse
persistent storage
stable in-cluster service name
HTTP access for Superset
native access for operational tooling

The current deployment may stay small for now. It can be expanded later without changing the tenant contract.

Tenant model

Each tenant gets:

one database: tenant_<slug>
one role: role_tenant_<slug>
one user: user_tenant_<slug>

The tenant user is used by Superset for that tenant.

This keeps the security model simple:

Superset tenant A only sees database A
Superset tenant B only sees database B
there is no shared base database in the normal path

Schema management

Schema is managed directly as versioned SQL migrations.

Migration structure:

migrations/
  <environment>/
    global/
    tenants/
      <tenant>/
        bootstrap/
        schema/
    tenants.txt

Meaning:

global/ contains shared operational SQL
bootstrap/ contains tenant database and access setup
schema/ contains tenant table definitions and later changes
tenants.txt defines which tenants are active for rollout

Rules:

the SQL file is the contract
the database name is written explicitly in tenant SQL
schema changes are additive and versioned
rollout happens through a migration command, not through ad-hoc manual edits in ClickHouse

Table design

This concept does not enforce a special naming model beyond clarity and stability.

Tables should be:

domain-specific
easy to understand
stable enough for dashboards and queries

Good examples are simple table names such as:

batterydata
reservationdata_device
reservationdata_reservations

The concept does not require a separate space dimension in the schema contract.

Data loading

Data loading is outside the schema mechanism.

That means:

schema migrations create and change tables
loader jobs insert data into those tables
loaders can be implemented independently of the migration runner

Typical sources:

Parquet files in curated storage
batch exports
domain-specific processing pipelines

This separation is useful because schema rollout and data ingestion have different failure modes and different release cycles.

Superset integration

Superset is integrated per tenant.

Each SupersetTenant can enable a ClickHouse datasource.

The runtime contract is:

ClickHouse host and port are platform-level settings
the tenant-specific database name, username, and password come from a Kubernetes Secret
the operator injects these values into the Superset pod
Superset registers or updates the ClickHouse datasource automatically

Required Secret keys:

database
username
password

This is enough to make the Superset side simple and repeatable.

Operational workflow

The standard path for a new tenant is:

create tenant bootstrap SQL
add tenant schema SQL
activate the tenant in the environment
run migration dry-run
apply migrations
generate tenant credentials
publish the tenant credential Secret for Superset
create or update the SupersetTenant
verify datasource readiness

The standard path for a schema change is:

add a new SQL migration file
review it in Git
run dry-run
apply it
run state checks
verify dashboards and queries

What stays stable over time

Even if the ClickHouse deployment grows later, these contracts should remain stable:

tenant database naming
tenant user naming
migration directory layout
explicit SQL migrations
Secret keys for Superset integration

That allows the platform to improve deployment topology later without redesigning the integration model.

Summary

ClickHouse integration is simple when the contract stays small:

ClickHouse is the analytics database
Superset connects directly to it
every tenant gets its own database and credentials
schema is managed as SQL in Git
migrations are applied explicitly
ingestion stays separate from schema rollout

This is not the most abstract solution. It is the one that is easiest to understand, operate, and extend safely.

SDK docs

Explorer

01---DuckDB---ClickHouse

ClickHouse Integration

Goal

Core idea

Architecture

Principles

ClickHouse deployment

Tenant model

Schema management

Table design

Data loading

Superset integration

Operational workflow

What stays stable over time

Summary

Graph View

Zuletzt bearbeitete Seiten

0---Allgemeine-Anforderungen

0---Pentest---Bewertung

0.1-Gesamtübersicht

0.2---Support,-Wartung

001_Overview