Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ClickHouse

easy-db-lab supports deploying ClickHouse clusters on Kubernetes for analytics workloads alongside your Cassandra cluster.

Overview

ClickHouse is deployed as a StatefulSet on K3s with ClickHouse Keeper for distributed coordination. The deployment requires a minimum of 3 nodes.

Quick Start

Create a 6-node cluster and deploy ClickHouse with 2 shards:

# Initialize and start a 6-node cluster
easy-db-lab init my-cluster --db 6 --up

# Deploy ClickHouse (2 shards x 3 replicas)
easy-db-lab clickhouse start

Configuring ClickHouse

Use clickhouse init to configure ClickHouse settings before starting the cluster:

# Configure S3 cache size (default: 10Gi)
easy-db-lab clickhouse init --s3-cache 50Gi

# Disable write-through caching
easy-db-lab clickhouse init --s3-cache-on-write false
OptionDescriptionDefault
--s3-cacheSize of the local S3 cache10Gi
--s3-cache-on-writeCache data during write operationstrue
--s3-tier-move-factorMove data to S3 tier when local disk free space falls below this fraction (0.0-1.0)0.2
--replicas-per-shardNumber of replicas per shard3

Configuration is saved to the cluster state and applied when you run clickhouse start.

Starting ClickHouse

To deploy ClickHouse on an existing cluster:

easy-db-lab clickhouse start

Options

OptionDescriptionDefault
--timeoutSeconds to wait for pods to be ready300
--skip-waitSkip waiting for pods to be readyfalse
--replicasNumber of ClickHouse server replicasNumber of db nodes
--replicas-per-shardNumber of replicas per shard3

Example with Custom Settings

# 6 nodes with 3 replicas per shard = 2 shards
easy-db-lab clickhouse start --replicas 6 --replicas-per-shard 3

# 9 nodes with 3 replicas per shard = 3 shards
easy-db-lab clickhouse start --replicas 9 --replicas-per-shard 3

Cluster Topology

ClickHouse is deployed with a sharded, replicated architecture. The total number of replicas must be divisible by --replicas-per-shard.

Shard and Replica Assignment

The cluster named easy_db_lab is automatically configured based on your replica count:

ConfigurationShardsReplicas/ShardTotal Nodes
Default (3 nodes)133
6 nodes, 3/shard236
9 nodes, 3/shard339
6 nodes, 2/shard326

Pod-to-Node Pinning

Each ClickHouse pod is pinned to a specific database node using Local PersistentVolumes with node affinity:

  • clickhouse-0 always runs on db0
  • clickhouse-1 always runs on db1
  • clickhouse-N always runs on dbN

This guarantees:

  1. Consistent shard assignment - A pod's shard is calculated from its ordinal: shard = (ordinal / replicas_per_shard) + 1
  2. Data locality - Data stored on a node stays with that node across pod restarts
  3. Predictable performance - No data movement when pods restart

Shard Calculation Example

With 6 replicas and 3 replicas per shard:

PodOrdinalShardNode
clickhouse-001db0
clickhouse-111db1
clickhouse-221db2
clickhouse-332db3
clickhouse-442db4
clickhouse-552db5

Checking Status

To check the status of your ClickHouse cluster:

easy-db-lab clickhouse status

This displays:

  • Pod status and health
  • Access URLs for the Play UI and HTTP interface
  • Native protocol connection details

Accessing ClickHouse

After deployment, ClickHouse is accessible via:

InterfaceURL/PortDescription
Play UIhttp://<db-node-ip>:8123/playInteractive web query interface
HTTP APIhttp://<db-node-ip>:8123REST API for queries
Native Protocol<db-node-ip>:9000High-performance binary protocol

Creating Tables

ClickHouse supports distributed, replicated tables that span multiple shards. The recommended pattern uses ReplicatedMergeTree for local replicated storage and Distributed for querying across shards.

Distributed Replicated Tables

Create a local replicated table on all nodes, then a distributed table for queries:

-- Step 1: Create local replicated table on all nodes
CREATE TABLE events_local ON CLUSTER easy_db_lab (
    id UInt64,
    timestamp DateTime,
    event_type String,
    data String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
ORDER BY (timestamp, id)
SETTINGS storage_policy = 's3_main';

-- Step 2: Create distributed table for querying across all shards
CREATE TABLE events ON CLUSTER easy_db_lab AS events_local
ENGINE = Distributed(easy_db_lab, default, events_local, rand());

Key points:

  • ON CLUSTER easy_db_lab runs the DDL on all nodes
  • {shard} and {replica} are ClickHouse macros automatically set per node
  • ReplicatedMergeTree replicates data within a shard using ClickHouse Keeper
  • Distributed routes queries and inserts across shards
  • rand() distributes inserts randomly; use a column for deterministic sharding

Querying and Inserting

-- Insert through distributed table (auto-sharded)
INSERT INTO events VALUES (1, now(), 'click', '{"page": "/home"}');

-- Query across all shards
SELECT count(*) FROM events WHERE event_type = 'click';

-- Query a specific shard (via local table)
SELECT count(*) FROM events_local WHERE event_type = 'click';

Table Engine Comparison

EngineUse CaseReplicationSharding
MergeTreeSingle-node, no replicationNoNo
ReplicatedMergeTreeReplicated within shardYesNo
DistributedQuery/insert across shardsVia underlying tableYes

Storage Policies

ClickHouse is configured with two storage policies. You select the policy when creating a table using the SETTINGS storage_policy clause.

Policy Comparison

Aspectlocals3_mains3_tier
Storage LocationLocal NVMe disksS3 bucket with configurable local cacheHybrid: starts local, moves to S3 when disk fills
PerformanceBest latency, highest throughputHigher latency, cache-dependentGood initially, degrades as data moves to S3
CapacityLimited by disk sizeVirtually unlimitedVirtually unlimited
CostIncluded in instance costS3 storage + request costsS3 storage + request costs
Data PersistenceLost when cluster is destroyedPersists independentlyPersists independently
Best ForBenchmarks, low-latency queriesLarge datasets, cost-sensitive workloadsMixed hot/cold workloads with automatic tiering

Local Storage (local)

The default policy stores data on local NVMe disks attached to the database nodes. This provides the best performance for latency-sensitive workloads.

CREATE TABLE my_table (...)
ENGINE = MergeTree()
ORDER BY id
SETTINGS storage_policy = 'local';

If you omit the storage_policy setting, tables use local storage by default.

When to use local storage:

  • Performance benchmarking where latency matters
  • Temporary or experimental datasets
  • Workloads with predictable data sizes that fit on local disks
  • When you don't need data to persist after cluster teardown

S3 Storage (s3_main)

The S3 policy stores data in your configured S3 bucket with a local cache for frequently accessed data. The cache size defaults to 10Gi and can be configured with clickhouse init --s3-cache. Write-through caching is enabled by default (--s3-cache-on-write true), which caches data during writes so subsequent reads can be served from cache immediately. This is ideal for large datasets where storage cost matters more than latency.

Prerequisite: Your cluster must be initialized with an S3 bucket. Set this during init:

easy-db-lab init my-cluster --s3-bucket my-clickhouse-data

Then create tables with S3 storage:

CREATE TABLE my_table (...)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/my_table', '{replica}')
ORDER BY id
SETTINGS storage_policy = 's3_main';

When to use S3 storage:

  • Large analytical datasets (terabytes+)
  • Data that should persist across cluster restarts
  • Cost-sensitive workloads where storage cost > compute cost
  • Sharing data between multiple clusters

How the cache works:

  • Hot (frequently accessed) data is cached locally for fast reads
  • Cold data is fetched from S3 on demand
  • Cache is automatically managed by ClickHouse
  • First query on cold data will be slower; subsequent queries use cache

S3 Tiered Storage (s3_tier)

The S3 tiered policy provides automatic data movement from local disks to S3 based on disk space availability. This policy starts with local storage and automatically moves data to S3 when local disk space runs low, providing the best of both worlds: fast local performance for hot data and unlimited S3 capacity for cold data.

Prerequisite: Your cluster must be initialized with an S3 bucket. Set this during init:

easy-db-lab init my-cluster --s3-bucket my-clickhouse-data

Configure the tiering behavior before starting ClickHouse:

# Move data to S3 when local disk free space falls below 20% (default)
easy-db-lab clickhouse init --s3-tier-move-factor 0.2

# More aggressive tiering - move when free space < 50%
easy-db-lab clickhouse init --s3-tier-move-factor 0.5

Then create tables with S3 tiered storage:

CREATE TABLE my_table (...)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/my_table', '{replica}')
ORDER BY id
SETTINGS storage_policy = 's3_tier';

When to use S3 tiered storage:

  • Workloads with mixed hot/cold data access patterns
  • Growing datasets that may outgrow local disk capacity
  • Want automatic cost optimization without manual intervention
  • Need local performance for recent data with S3 capacity for historical data

How automatic tiering works:

  • New data is written to local disks first (fast writes)
  • When local disk free space falls below the configured threshold (default: 20%), ClickHouse automatically moves the oldest data to S3
  • Data on S3 is still queryable but with higher latency
  • The local cache (configured with --s3-cache) helps performance for frequently accessed S3 data
  • Manual moves are also possible: ALTER TABLE my_table MOVE PARTITION tuple() TO DISK 's3'

Stopping ClickHouse

To remove the ClickHouse cluster:

easy-db-lab clickhouse stop

This removes all ClickHouse pods, services, and associated resources from Kubernetes.

Monitoring

ClickHouse metrics are automatically integrated with the observability stack:

  • Grafana Dashboard: Pre-configured dashboard for ClickHouse metrics
  • Metrics Port: 9363 for Prometheus-compatible metrics
  • Logs Dashboard: Dedicated dashboard for ClickHouse logs

Architecture

The ClickHouse deployment includes:

  • ClickHouse Server: StatefulSet with configurable replicas
  • ClickHouse Keeper: 3-node cluster for distributed coordination (ZooKeeper-compatible)
  • Services: Headless services for internal communication
  • ConfigMaps: Server and Keeper configuration
  • Local PersistentVolumes: One PV per node for data locality

Storage Architecture

ClickHouse uses Local PersistentVolumes to guarantee pod-to-node pinning:

  1. During cluster creation, each db node is labeled with its ordinal (easydblab.com/node-ordinal=0, etc.)
  2. Local PVs are created with node affinity matching these ordinals
  3. PVs are pre-bound to specific PVCs (e.g., data-clickhouse-0 binds to the PV on db0)
  4. The StatefulSet's volumeClaimTemplate requests storage from these pre-bound PVs

This ensures clickhouse-X always runs on dbX, providing:

  • Consistent shard assignments across restarts
  • Data locality (no network storage overhead)
  • Predictable failover behavior

Ports

PortPurpose
8123HTTP interface
9000Native protocol
9009Inter-server communication
9363Metrics
2181Keeper client
9234Keeper Raft