Tutorial: From Setup to Running Cassandra

This tutorial walks you through creating a Cassandra cluster from scratch, covering initialization, infrastructure provisioning, and Cassandra configuration.

Prerequisites

Before starting, ensure you've completed the Setup process by running easy-db-lab setup-profile.

Part 1: Initialize Your Cluster

The init command creates local configuration files for your cluster. It does not provision AWS resources yet.

easy-db-lab init my-cluster

This creates a 3-node Cassandra cluster by default.

Init Options

Option Description Default
--db, --cassandra, -c Number of Cassandra instances 3
--app, --stress, -s Number of stress/application instances 0
--instance, -i Cassandra instance type r3.2xlarge
--stress-instance, -si Stress instance type c7i.2xlarge
--azs, -z Availability zones (e.g., a,b,c) all available
--arch, -a CPU architecture (AMD64, ARM64) AMD64
--ebs.type EBS volume type (NONE, gp2, gp3, io1, io2) NONE
--ebs.size EBS volume size in GB 256
--ebs.iops EBS IOPS (gp3 only) 0
--ebs.throughput EBS throughput (gp3 only) 0
--until When instances can be deleted tomorrow
--tag Custom tags (key=value, repeatable) -
--vpc Use existing VPC ID -
--up Auto-provision after init false
--clean Remove existing config first false

Examples

Basic 3-node cluster:

easy-db-lab init my-cluster

5-node cluster with 2 stress nodes:

easy-db-lab init my-cluster --db 5 --stress 2

Production-like cluster with EBS storage:

easy-db-lab init prod-test --db 5 --ebs.type gp3 --ebs.size 500 --ebs.iops 3000

ARM64 cluster for Graviton instances:

easy-db-lab init my-cluster --arch ARM64 --instance r7g.2xlarge

Initialize and provision in one step:

easy-db-lab init my-cluster --up

Part 2: Launch Infrastructure

Once initialized, provision the AWS infrastructure:

easy-db-lab up

This command creates:

What Happens During up

  1. Creates S3 bucket for cluster state
  2. Creates VPC with public subnets in your availability zones
  3. Provisions EC2 instances in parallel
  4. Waits for SSH availability
  5. Configures K3s cluster on all nodes
  6. Writes SSH config and environment files

Up Options

Option Description
--no-setup, -n Skip K3s setup and AxonOps configuration

Environment Setup

After up completes, source the environment file:

source env.sh

This configures your shell with:

See Shell Aliases for all available shortcuts.

Part 3: Configure Cassandra 5.0

With infrastructure running, configure and start Cassandra.

Step 1: Select Cassandra Version

easy-db-lab cassandra use 5.0

This command:

Available versions: 3.0, 3.11, 4.0, 4.1, 5.0, 5.0-HEAD, trunk

Step 2: Generate Configuration

easy-db-lab cassandra write-config

This creates cassandra.patch.yaml with cluster-specific settings:

cluster_name: "my-cluster"
num_tokens: 4
seed_provider:
  class_name: "org.apache.cassandra.locator.SimpleSeedProvider"
  parameters:
    seeds: "10.0.1.28"
hints_directory: "/mnt/db1/cassandra/hints"
data_file_directories:
  - "/mnt/db1/cassandra/data"
commitlog_directory: "/mnt/db1/cassandra/commitlog"
concurrent_reads: 64
concurrent_writes: 64
trickle_fsync: true
endpoint_snitch: "Ec2Snitch"

Auto-Injected Fields

listen_address and rpc_address are automatically injected per-node with each host's private IP address. You don't need to specify them in the patch file.

Step 3: Customize Configuration (Optional)

Edit cassandra.patch.yaml to customize settings:

# Example: Change token count
vim cassandra.patch.yaml

Common customizations:

Setting Description Default
num_tokens Virtual nodes per instance 4
concurrent_reads Max concurrent read operations 64
concurrent_writes Max concurrent write operations 64
endpoint_snitch Network topology snitch Ec2Snitch

Step 4: Apply Configuration

easy-db-lab cassandra update-config

This uploads and applies the patch to all Cassandra nodes.

To apply and restart Cassandra in one command:

easy-db-lab cassandra update-config --restart

Step 5: Start Cassandra

easy-db-lab cassandra start

Step 6: Verify Cluster

Check cluster status:

ssh db0 nodetool status

Or use the shell alias (after sourcing env.sh):

c-status

You should see all nodes in UN (Up/Normal) state.

Part 4: Working with Your Cluster

SSH Access

After sourcing env.sh:

ssh db0          # First Cassandra node
ssh db1          # Second Cassandra node
ssh stress0      # First stress node (if provisioned)
ssh control0     # Control node

Cassandra Management

# Stop Cassandra on all nodes
easy-db-lab cassandra stop

# Start Cassandra on all nodes
easy-db-lab cassandra start

# Restart Cassandra on all nodes
easy-db-lab cassandra restart

Filter to Specific Hosts

Most commands support the --hosts filter:

# Apply config only to db0 and db1
easy-db-lab cassandra update-config --hosts db0,db1

# Restart only db2
easy-db-lab cassandra restart --hosts db2

Download Configuration Files

To download the current configuration from nodes:

easy-db-lab cassandra download-config

This saves configuration files to a local directory named after the version (e.g., 5.0/).

Part 5: Shut Down

When finished, destroy the cluster infrastructure:

easy-db-lab down

Warning

This permanently destroys all EC2 instances, the VPC, and associated resources. S3 buckets are retained for state recovery.

Quick Reference

Task Command
Initialize cluster easy-db-lab init <name> [options]
Provision infrastructure easy-db-lab up
Initialize and provision easy-db-lab init <name> --up
Select Cassandra version easy-db-lab cassandra use <version>
Generate config patch easy-db-lab cassandra write-config
Apply configuration easy-db-lab cassandra update-config
Start Cassandra easy-db-lab cassandra start
Stop Cassandra easy-db-lab cassandra stop
Restart Cassandra easy-db-lab cassandra restart
Check cluster status ssh db0 nodetool status
Download config easy-db-lab cassandra download-config
Destroy cluster easy-db-lab down
Display hosts easy-db-lab hosts
Clean local files easy-db-lab clean

Next Steps