ClickStack + HyperDX Observability with Kubernetes Operators

January 23, 2026

by Afanasy Barbarov

ClickStack + HyperDX Observability with Kubernetes Operators

The journey from basic ClickStack to production-grade observability with Kubernetes operators.

The starting point

I began with the simplest possible ClickStack deployment - a single Helm install that bundles everything: ClickHouse, OpenTelemetry Collector, MongoDB, and the HyperDX UI. Four pods, zero configuration headaches.

helm install clickstack clickstack/clickstack \
  --namespace observability \
  -f k8s/clickstack-values-echo.yaml

It worked. I could see the HyperDX dashboard, poke around the UI. But this wasn't production-ready: single ClickHouse pod (no replication), single OTel Collector (can't collect node-specific metrics from multiple nodes), no high availability.

Why operators?

ClickStack's docs recommend using Kubernetes operators for production. The idea: instead of Helm managing individual pods, specialized operators manage complex stateful applications via Custom Resource Definitions (CRDs).

Two operators are needed:

OpenTelemetry Operator - Creates and manages OTel Collectors from OpenTelemetryCollector CRDs. Supports DaemonSets (one collector per node) and Deployments (cluster-wide collectors).

Altinity ClickHouse Operator - Creates and manages ClickHouse clusters from ClickHouseInstallation CRDs. Handles replication, sharding, user management.

Installing the operators

First, the OpenTelemetry Operator. I don't use cert-manager, so auto-generated certs:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

helm install opentelemetry-operator open-telemetry/opentelemetry-operator \
  --namespace observability \
  --set admissionWebhooks.certManager.enabled=false \
  --set admissionWebhooks.autoGenerateCert.enabled=true \
  --set manager.resources.requests.cpu=100m \
  --set manager.resources.requests.memory=128Mi \
  --set manager.resources.limits.cpu=200m \
  --set manager.resources.limits.memory=256Mi \
  --wait

Then the ClickHouse Operator. I named the release cho to avoid ridiculously long pod names (the default altinity-clickhouse-operator creates pods like clickhouse-operator-altinity-clickhouse-operator-xyz):

helm repo add altinity https://helm.altinity.com

helm install cho altinity/altinity-clickhouse-operator \
  --namespace observability \
  --wait

PodSecurity warnings appear but pods run fine - the namespace is labeled privileged.

Creating the ClickHouse cluster

With the operator running, create a 2-replica ClickHouse cluster via CRD:

kubectl apply -f k8s/clickhouse-cluster.yaml

The operator creates the ClickHouse pods. Check status:

kubectl get chi -n observability

Service endpoint: clickhouse-echo.observability.svc.cluster.local

The OTel Collector challenge

This is where things got interesting. I needed collectors to gather:

Container logs - from /var/log/pods on each node
Host metrics - CPU, memory, disk, network per node
Kubelet stats - pod/container resource usage from kubelet API
Kubernetes events - cluster-wide events (pod created, failed, etc.)
Pod status - which pods are Running, Pending, Failed

The catch: some data is node-specific (logs, host metrics, kubelet stats), some is cluster-wide (events, pod status).

DaemonSet vs Deployment

I learned the hard way: you can't collect node-specific data from a single Deployment. The kubeletstats receiver talks to the local kubelet on each node. A Deployment runs on one node and can only see that node's kubelet.

Solution: two collectors.

DaemonSet (otel-collector.yaml) - runs on every node:

filelog receiver (container logs)
hostmetrics receiver (CPU, memory, disk, network)
kubeletstats receiver (pod/container metrics from local kubelet)
otlp receiver (for apps to send traces/metrics)

Deployment (otel-collector-cluster.yaml) - runs once, cluster-wide:

k8s_cluster receiver (pod status, deployment info)
k8sobjects receiver (kubernetes events)

kubectl apply -f k8s/otel-collector-rbac.yaml
kubectl apply -f k8s/otel-collector.yaml
kubectl apply -f k8s/otel-collector-cluster.yaml

Fixing kubeletstats

The kubeletstats receiver was a pain. First it couldn't resolve the node hostname - Talos Linux nodes have hostnames that don't resolve in DNS. Fixed by using K8S_NODE_IP environment variable:

env:
  - name: K8S_NODE_IP
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP

Then: 403 Forbidden. The service account didn't have permission to access kubelet's /stats endpoint. Fixed by creating RBAC with nodes/stats and nodes/proxy permissions.

Then: deprecation warnings about CPU utilization metrics. Fixed with feature gate:

args:
  feature-gates: "+receiver.kubeletstats.enableCPUUsageMetrics"

Fixing filelog

Container logs weren't appearing. Turned out start_at: end only collects new logs written after the collector starts. Changed to start_at: beginning to catch existing logs. Also needed to mount /var/log/pods from the host.

Fixing pod status

Pod status showed "Unknown" in HyperDX. The kubeletstats receiver doesn't include pod phase - it only has resource metrics. The k8s_cluster receiver was missing. Added it in a separate Deployment (it needs cluster-wide view, not per-node).

Also hit RBAC issues: k8s_cluster receiver needs access to replicationcontrollers, services, resourcequotas, and more. Updated the ClusterRole.

Reconfiguring ClickStack

Finally, I pointed ClickStack at the operator-managed ClickHouse:

helm upgrade clickstack clickstack/clickstack \
  --namespace observability \
  -f k8s/clickstack-values-operators.yaml \
  --wait

The values file disables built-in ClickHouse and OTel, and overrides the connection string:

clickhouse:
  enabled: false
otel:
  enabled: false
hyperdx:
  defaultConnections: |
    [{"name": "Local ClickHouse", "host": "http://clickhouse-echo:8123", ...}]

I had to delete the MongoDB PVC once because it cached the old connection config. Fresh start fixed it.

The final architecture

After all the fixes:

3x otel-collector (DaemonSet) - one per node, collecting logs, host metrics, kubelet stats
1x otel-cluster-collector (Deployment) - cluster-wide events and pod status
2x ClickHouse replicas - data replication
1x HyperDX - the UI
1x MongoDB - HyperDX metadata

Access the UI

kubectl port-forward svc/clickstack-app -n observability 3000:3000

Open http://localhost:3000

Verification

Check logs are flowing:

kubectl exec -n observability <clickhouse-pod> -- \
  clickhouse-client --user otel --password <your-password> \
  -q "SELECT count() FROM otel_logs"

Check all pods healthy:

kubectl get pods -n observability

Files

File	Purpose
`k8s/clickhouse-cluster.yaml`	ClickHouse cluster CRD (2 replicas)
`k8s/otel-collector.yaml`	OTel Collector DaemonSet (logs, host metrics, kubelet stats)
`k8s/otel-collector-cluster.yaml`	OTel Collector Deployment (k8s events, pod status)
`k8s/otel-collector-rbac.yaml`	RBAC for all collectors
`k8s/clickstack-values-operators.yaml`	ClickStack Helm values (operators mode)

Clean reinstall

If you need to wipe and reinstall (order matters - delete CRs before operators):

# Delete CRs first (operators handle finalizers)
kubectl delete opentelemetrycollector otel otel-cluster -n observability
kubectl delete chi echo -n observability

# Delete cluster-wide RBAC
kubectl delete clusterrolebinding otel-collector
kubectl delete clusterrole otel-collector

# Uninstall Helm releases
helm uninstall clickstack -n observability
helm uninstall cho -n observability
helm uninstall opentelemetry-operator -n observability

# Delete data
kubectl delete pvc --all -n observability

Then reinstall in order: operators first, RBAC, CRs, ClickStack last.

ClickStack + HyperDX Observability with Kubernetes Operators

ClickStack + HyperDX Observability with Kubernetes Operators

The starting point

Why operators?

Installing the operators

Creating the ClickHouse cluster

The OTel Collector challenge

DaemonSet vs Deployment

Fixing kubeletstats

Fixing filelog

Fixing pod status

Reconfiguring ClickStack

The final architecture

Access the UI

Verification

Files

Clean reinstall

More articles

Previous post

Next post