Real-Time Analytics with ClickHouse and JavaScript

Team 7 min read

#clickhouse

#real-time

#javascript

#analytics

#dashboard

Introduction

Real-time analytics are essential for modern dashboards, monitoring, and product analytics. ClickHouse, with its columnar storage, fast aggregations, and flexible ingestion options, pairs nicely with JavaScript frontends to deliver near-instant insights. This post walks through an end-to-end approach: ingesting events into ClickHouse, using materialized views for pre-aggregations, and rendering live dashboards with a JavaScript frontend over ClickHouse’s HTTP interface.

Why ClickHouse for Real-Time Analytics

  • Fast, scalable analytics on large data sets due to columnar storage and aggressive compression.
  • Real-time ingestion pathways via Kafka engine and HTTP inserts.
  • Support for materialized views to maintain up-to-date aggregates (e.g., per-minute counts) without re-running heavy queries.
  • Flexible querying with SQL, plus an HTTP API suitable for frontend apps.
  • Suitable for dashboards that need frequent updates (seconds to minutes).

Architecture overview

  • Data sources emit events (e.g., page views, clicks, sensor readings).
  • Ingestion layer pushes events into ClickHouse (Kafka engine or HTTP inserts).
  • ClickHouse stores raw events in a MergeTree family table.
  • A materialized view maintains minute-level aggregates (or other rollups) in a separate table for fast querying.
  • Frontend, built with JavaScript, polls or streams aggregated results via ClickHouse HTTP interface and renders live charts.

Ingesting data into ClickHouse

There are several common ingestion patterns. Here are two practical options.

  • Ingest raw events via HTTP

  • Ingest via Kafka (recommended for high-volume streams)

    • You create a Kafka engine source table and a materialized view that streams into a target MergeTree table.

    • Example setup: CREATE DATABASE IF NOT EXISTS analytics;

      CREATE TABLE analytics.events ( event_time DateTime, user_id UInt64, page String, event_type LowCardinality(String) ) ENGINE = MergeTree() ORDER BY (event_time);

      — Kafka source (adapt broker/topic settings) CREATE TABLE analytics.events_kafka ( event_time DateTime, user_id UInt64, page String, event_type LowCardinality(String) ) ENGINE = Kafka() SETTINGS kafka_broker_list = ‘kafka1:9092,kafka2:9092’, kafka_topic_list = ‘analytics_events’, kafka_group_name = ‘analytics_ingest’, kafka_format = ‘JSONEachRow’;

      — Materialized view to populate the target table from Kafka CREATE MATERIALIZED VIEW analytics.events_mv TO analytics.events AS SELECT event_time, user_id, page, event_type FROM analytics.events_kafka;

    This pattern ensures events are ingested in real time into analytics.events and are immediately available for queries.

Real-time aggregation with materialized views

Raw event data is often too granular for dashboards. A materialized view helps you maintain pre-aggregated data (e.g., per-minute counts) with minimal query latency.

  • Example minute-level aggregation (count of events per minute)

    • Create an aggregation table (optional if your dashboard reads from a pre-aggregated table)

    • A typical approach is a separate MergeTree table for aggregates: CREATE TABLE analytics.minute_views ( event_minute DateTime, views UInt64 ) ENGINE = MergeTree() ORDER BY (event_minute);

    • Materialized view to populate the aggregation from the raw events: CREATE MATERIALIZED VIEW analytics.minute_views_mv TO analytics.minute_views AS SELECT toStartOfMinute(event_time) AS event_minute, count() AS views FROM analytics.events GROUP BY event_minute;

    • Querying the latest 15 minutes of aggregates: SELECT event_minute, views FROM analytics.minute_views WHERE event_minute >= now() - INTERVAL 15 MINUTE ORDER BY event_minute;

    This approach keeps fast-running queries on the frontend and reduces load on the raw events table, while still allowing ad-hoc queries when needed.

Querying from the frontend using ClickHouse HTTP API

ClickHouse can serve data directly to a browser via its HTTP interface. A typical pattern is to request JSON-formatted data for charts.

  • Example query for the last 15 minutes, grouped by minute: SELECT toStartOfMinute(event_time) AS t, count(*) AS views FROM analytics.events WHERE event_time >= now() - INTERVAL 15 MINUTE GROUP BY t ORDER BY t FORMAT JSONCompact

  • Frontend fetch example (JavaScript):

    • Assumes your ClickHouse instance is reachable from the browser (consider CORS and authentication in production).
    • Uses JSONCompact for a compact payload.

Code snippet (frontend):

<!doctype html>
<html>
<head>
  <title>Real-Time Analytics</title>
  <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
  <canvas id="realtimeChart" width="800" height="400"></canvas>
  <script>
    const ctx = document.getElementById('realtimeChart').getContext('2d');
    const chart = new Chart(ctx, {
      type: 'line',
      data: { labels: [], datasets: [{ label: 'Views', data: [] }] },
      options: { scales: { x: { type: 'time', time: { tooltipFormat: 'HH:mm' } } } }
    });

    async function fetchData() {
      const q = `SELECT toStartOfMinute(event_time) AS t, count(*) AS views
                 FROM analytics.events
                 WHERE event_time >= now() - INTERVAL 15 MINUTE
                 GROUP BY t
                 ORDER BY t
                 FORMAT JSONCompact`;
      const url = 'http://your-clickhouse-host:8123/?query=' + encodeURIComponent(q);
      const res = await fetch(url, {
        headers: { 'Content-Type': 'application/json' }
      });
      if (!res.ok) {
        console.error('ClickHouse query failed', res.status);
        return;
      }
      const data = await res.json();
      // JSONCompact format returns data as an array of rows: [ [t, views], ... ]
      const rows = data.data || [];
      const labels = rows.map(r => new Date(r[0]));
      const values = rows.map(r => r[1]);
      chart.data.labels = labels;
      chart.data.datasets[0].data = values;
      chart.update();
    }

    // Initial fetch and then refresh every 5 seconds
    fetchData();
    setInterval(fetchData, 5000);
  </script>
</body>
</html>

Notes:

  • Replace http://your-clickhouse-host:8123 with your actual ClickHouse HTTP endpoint.
  • In production, consider authentication (ClickHouse has user-based access control) and enabling CORS for your frontend domain.
  • Chart.js is used here for simplicity; you can substitute D3.js, Vega-Lul, or any charting library you prefer.

End-to-end quickstart (minimal steps)

If you want a quick start, here’s a compact blueprint you can adapt.

  1. Set up ClickHouse (local or in a container)
  • Start ClickHouse server and create a database:
    • CREATE DATABASE IF NOT EXISTS analytics;
  1. Create core tables
  • Raw events: CREATE TABLE analytics.events ( event_time DateTime, user_id UInt64, page String, event_type LowCardinality(String) ) ENGINE = MergeTree() ORDER BY (event_time);

  • Optional: aggregation table (per minute) CREATE TABLE analytics.minute_views ( event_minute DateTime, views UInt64 ) ENGINE = MergeTree() ORDER BY (event_minute);

  1. Ingestion path
  • For low volume, use HTTP inserts as shown above.
  • For streaming at scale, configure Kafka ingestion:
    • Create analytics.events_kafka and a materialized view to feed analytics.events, then a separate MV to analytics.minute_views as shown earlier.
  1. Frontend
  • Build a small HTML page (as in the snippet) that queries ClickHouse via the HTTP interface and renders a chart.
  • Use JSONCompact or JSON for data interchange.
  1. Observability and tuning
  • Index by event_time and use toStartOfMinute for aggregations.
  • Consider TTL and partitioning on event_time for managing cold data.
  • Use lowCardinality for high-cardinality string fields to save space.
  • Use security: users, TLS, and IP allowlists; enable user-based authentication and limit query complexity if exposing publicly.

Performance considerations

  • Use materialized views to pre-aggregate heavy queries; avoid scanning the raw events table for dashboards.
  • Keep the ingestion path resilient: Kafka provides backpressure handling; HTTP inserts should be batched if you scale beyond a few hundred inserts per second.
  • Tune MergeTree parameters (e.g., index_granularity) based on your query patterns and retention.
  • Consider data retention policies: short-term raw data with long-term aggregates in cheaper storage or different engines.

Security considerations

  • Do not expose the ClickHouse HTTP endpoint directly to the public internet.
  • Use authentication and TLS for all HTTP queries.
  • Enforce least-privilege users and limit the types of queries allowed from the frontend.
  • Implement rate limiting on the frontend to prevent abuse and accidental DoS.

Conclusion

Real-time analytics with ClickHouse and JavaScript is a practical, scalable approach for dashboards and product analytics. By ingesting data efficiently (Kafka or HTTP), maintaining per-minute aggregates via materialized views, and querying through ClickHouse’s HTTP API, you can deliver responsive, up-to-date visuals in the browser. Start with a simple setup, iterate on your ingestion and aggregation strategies, and you’ll have a robust real-time analytics stack that’s both fast and maintainable.