From 90 minutes to 3 seconds: putting ClickHouse behind our analytics
Once a month, one of our reports took about 90 minutes to run. It was the kind of job you kicked off and then went to make coffee, twice. Today the same report finishes in about three seconds. This is the story of how we closed that gap, and the things we learned about ClickHouse that were not obvious from the docs.
The 90-minute report
The report is an MSP session revenue breakdown. Conceptually it is simple: for a given month, take every charging session, throw away the ones that are deleted, excluded, or already credited, convert each one into a common currency, then fold in roaming hub fees and credit sessions. Group by provider and month, and you have your revenue picture.
In practice that one sentence hides a lot of work:
- Scan every MSP session for the month.
- Join exchange rates and convert currency on the fly.
- Join roaming hub fees.
- Join and subtract credit sessions.
- Aggregate the lot per provider.
On MongoDB, as a document store built for transactional workloads, this is an aggregation pipeline fighting well outside its weight class. As the data grew, the monthly export crept up to around an hour and a half.
Why the document store struggled
Nothing here is a knock on MongoDB. It is an excellent operational database, and it backs the platform happily. The problem is that this is an OLAP workload wearing an OLTP database’s clothing: scan millions of rows, join a handful of reference sets, do arithmetic across all of them, and collapse the result. Row-oriented storage has to walk whole documents to read a few fields, and on-the-fly currency conversion across joins multiplies the cost. It was the wrong tool, used heroically.
Enter ClickHouse
ClickHouse is a column-oriented SQL database built for online analytical processing. Because it stores each column separately, a query reads only the columns it actually touches, compresses them hard, and distributes the work. It speaks ordinary SQL, and it scales horizontally. For a “scan a lot, aggregate a lot” job, it is built for exactly this.
The catch: our source of truth still lives in MongoDB. So the first job was getting data across, and keeping it current.
Mirroring change streams
We mirror MongoDB change streams into ClickHouse. Every insert, update, and delete on the collections we care about is streamed into a matching ClickHouse table, so the analytical copy tracks the operational one continuously rather than through nightly dumps.
That immediately raises a question: ClickHouse loves append-only data, but a change stream is full of updates and deletes. How do you keep a moving target in a store that would rather you never changed a row?
Table engines, and the trick to staying current
In ClickHouse the table engine decides how and where data is stored, which queries are supported, how indexes work, and how concurrency and replication behave. Two matter here.
MergeTree is the default. Think of it as append-only: if we used it for our
change-stream tables, we would keep the full history of every change forever.
ReplacingMergeTree is what we actually use. It adds an ORDER BY key that behaves
like a primary key, and it lets rows sharing that key be merged down to the latest version
only. A version column tells it which row wins, and a delete flag lets rows be removed.
There are two subtleties worth internalising:
- The
ORDER BYkey is the physical layout. It is not just an index, it is the literal order rows are written to disk. Choose it to match how you query. - Deletes are tombstones, not removals. A deleted row stays in the table with its delete flag set. Otherwise ClickHouse could not tell whether a later insert came before or after the delete. As a bonus, this means you can undelete.
Compaction, the background merge that collapses old versions, happens whenever ClickHouse decides it is worth the cost. Which leads to the keyword everyone meets eventually.
The FINAL keyword, and its price
Because compaction is lazy, a plain SELECT can still see un-merged duplicates and stale
versions. FINAL forces the merge at read time so you get the deduplicated truth. It is
correct, and it is not free:
SELECT count() FROM invoices;
-- 9,238,585 rows. Elapsed: 0.002 sec.
SELECT count() FROM invoices FINAL;
-- 9,107,470 rows. Elapsed: 0.366 sec. Processed 10.35M rows, 350 MB.
Same table, two very different answers and two very different costs. The lesson is not
“avoid FINAL”, it is “know when you need it”. Use it where correctness demands the merged
view, and design your tables so the hot paths rarely have to.
A sharp edge: ORDER BY meets migrations
Here is the one that bit us. To speed up per-provider queries, we tried adding
provider_id to the ORDER BY key on the sessions table. Sensible on paper. Then sessions
started coming back as duplicates.
The reason: the ORDER BY key is what defines row identity for deduplication. Change the
key and you change what “the same row” means, and our historical migration data no longer
lined up under the new key, so records that should have merged simply did not.
The safer move is a projection: keep the base table keyed by id, and let ClickHouse
maintain an alternate physical ordering just for the queries that need it.
CREATE TABLE IF NOT EXISTS changestream.charges ON CLUSTER 'default' (
id String,
version UInt64,
-- [..]
is_deleted Boolean,
PROJECTION projection_provider_year_month (
SELECT * ORDER BY (provider_id, year, month, id)
)
) ENGINE = ReplicatedReplacingMergeTree(version, is_deleted)
ORDER BY (id);
Identity stays stable on id; the provider-shaped queries still get their fast ordering.
Refreshable materialised views
With the data modelled well, the revenue breakdown query itself runs in about three seconds. Fast, but still too slow to serve interactively at scale, and it recomputes the same thing for every reader. So we materialise it. A refreshable view recomputes the whole result on a schedule into a target table, and reads become instant lookups:
CREATE MATERIALIZED VIEW IF NOT EXISTS
changestream.msp_sessions_revenue_breakdown_mv
ON CLUSTER 'default'
REFRESH EVERY 1 HOUR
TO changestream.msp_sessions_revenue_breakdown;
The three-second query runs once an hour in the background; everyone else reads a table that is already computed.
The payoff
The monthly session export went from roughly 90 minutes to about 3 seconds. And because ClickHouse can emit CSV or JSON directly, the export stopped being a pipeline and became a query.
Two notes for the curious
SELECT *is not cheap in a columnar store. It reads every column off disk, even the ones you ignore. Name the columns you want.- No
NULLby default. You can opt intoNullable, but it adds a hidden column to track which values are null. Prefer a sensible default where you can.
None of this is exotic once it clicks (pun intended). The mental shift is the interesting part: stop thinking in documents and rows, start thinking in columns and physical order, and a report you used to schedule around becomes something you barely notice running.