Allocation profiling
ClickHouse uses jemalloc as its global allocator. Jemalloc comes with tools for allocation sampling and profiling.
ClickHouse and Keeper allow you to control sampling using configs, query settings, SYSTEM commands and four letter word (4LW) commands in Keeper. There are several ways to inspect the results:
- Collect samples into
system.trace_logunder theJemallocSampletype for per-query analysis. - View live memory statistics and fetch heap profiles through the built-in jemalloc web UI (26.2+).
- Query the current heap profile directly from SQL using
system.jemalloc_profile_text(26.2+). - Flush heap profiles to disk and analyze them with
jeprof.
This guide is applicable for versions 25.9+. For older versions, please check allocation profiling for versions before 25.9.
Sampling allocations
To sample and profile allocations, start ClickHouse/Keeper with the jemalloc_enable_global_profiler config enabled:
jemalloc will sample allocations and store the information internally.
You can also enable sampling per query using the jemalloc_enable_profiler setting.
Because ClickHouse is an allocation-heavy application, jemalloc sampling may incur performance overhead.
Storing jemalloc samples in system.trace_log
You can store jemalloc samples in system.trace_log under the JemallocSample type.
To enable it globally, use the jemalloc_collect_global_profile_samples_in_trace_log config:
Because ClickHouse is an allocation-heavy application, collecting all samples in system.trace_log may incur high load.
You can also enable it per query using the jemalloc_collect_profile_samples_in_trace_log setting.
Example: analyzing memory usage of a query
First, run a query with the jemalloc profiler enabled and collect the samples into system.trace_log:
If ClickHouse was started with jemalloc_enable_global_profiler, you don't have to enable jemalloc_enable_profiler.
Same is true for jemalloc_collect_global_profile_samples_in_trace_log and jemalloc_collect_profile_samples_in_trace_log.
Flush the system.trace_log:
Then query it to get cumulative memory usage over time:
Find the time where memory usage was the highest:
Using that result, see which allocation stacks were most active at the peak:
Jemalloc web UI
This section is applicable for versions 26.2+.
ClickHouse provides a built-in web UI for viewing jemalloc memory statistics at the /jemalloc HTTP endpoint.
It displays live memory metrics with charts, including allocated, active, resident, and mapped memory, as well as per-arena and per-bin statistics.
You can also fetch global and per-query heap profiles directly from the UI.
To access it, open in your browser:
Fetching heap profiles from SQL
This section is applicable for versions 26.2+.
The system.jemalloc_profile_text system table lets you fetch and view the current jemalloc heap profile directly from SQL, without needing external tools or flushing to disk first.
The table has a single column:
| Column | Type | Description |
|---|---|---|
line | String | Line from the symbolized jemalloc heap profile. |
You can query the table directly — there is no need to flush a heap profile beforehand:
Output format
The output format is controlled by the jemalloc_profile_text_output_format setting, which supports three values:
raw— raw heap profile as produced by jemalloc.symbolized— jeprof-compatible format with embedded function symbols. Since symbols are already embedded,jeprofcan analyze the output without requiring the ClickHouse binary.collapsed(default) — FlameGraph-compatible collapsed stacks, one stack per line with the byte count.
For example, to get the raw profile:
To get symbolized output:
Additional settings
jemalloc_profile_text_symbolize_with_inline(Bool, default:true) — Whether to include inline frames when symbolizing. Disabling this speeds up symbolization significantly but loses precision as inlined function calls will not appear in the stacks. Only affectssymbolizedandcollapsedformats.jemalloc_profile_text_collapsed_use_count(Bool, default:false) — When using thecollapsedformat, aggregate by allocation count instead of bytes.
Example: generating a flame graph from SQL
Since the default output format is collapsed, you can pipe the output directly to FlameGraph:
To generate a flame graph by allocation count instead of bytes:
Flushing heap profiles to disk
If you need to save heap profiles as files for offline analysis with jeprof, you can flush them to disk.
By default, the heap profile file will be generated in /tmp/jemalloc_clickhouse._pid_._seqnum_.heap where _pid_ is the PID of ClickHouse and _seqnum_ is the global sequence number for the current heap profile.
For Keeper, the default file is /tmp/jemalloc_keeper._pid_._seqnum_.heap, and follows the same rules.
To flush the current profile:
- ClickHouse
- Keeper
It will return the location of the flushed profile.
A different location can be defined by appending the MALLOC_CONF environment variable with the prof_prefix option.
For example, if you want to generate profiles in the /data folder where the filename prefix will be my_current_profile, you can run ClickHouse/Keeper with the following environment variable:
The generated file will be appended to the prefix PID and sequence number.
Analyzing heap profile files with jeprof
After flushing heap profiles to disk, they can be analyzed using jemalloc's tool called jeprof. It can be installed in multiple ways:
- Using the system's package manager
- Cloning the jemalloc repo and running
autogen.shfrom the root folder. This will provide you with thejeprofscript inside thebinfolder
There are many different output formats available. Run jeprof --help for the full list of options.
Symbolized heap profiles
Starting from version 26.1+, ClickHouse automatically generates symbolized heap profiles when you flush using SYSTEM JEMALLOC FLUSH PROFILE.
The symbolized profile (with .symbolized extension) contains embedded function symbols and can be analyzed by jeprof without requiring the ClickHouse binary.
For example, when you run:
ClickHouse will return the path to the symbolized profile (e.g., /tmp/jemalloc_clickhouse.12345.0.heap.symbolized).
You can then analyze it directly with jeprof:
No binary required: When using symbolized profiles (.symbolized files), you don't need to provide the ClickHouse binary path to jeprof. This makes it much easier to analyze profiles on different machines or after the binary has been updated.
If you have an older non-symbolized heap profile and still have access to the ClickHouse binary, you can use the traditional approach:
For non-symbolized profiles, jeprof uses addr2line to generate stacktraces which can be really slow.
If that's the case, it is recommended to install an alternative implementation of the tool.
Alternatively, llvm-addr2line works equally well (But note, that llvm-objdump is not compatible with jeprof)
And later use it like this jeprof --tools addr2line:/usr/bin/llvm-addr2line,nm:/usr/bin/llvm-nm,objdump:/usr/bin/objdump,c++filt:/usr/bin/llvm-cxxfilt
When comparing two profiles, you can use the --base argument:
Examples
Using symbolized profiles (recommended):
- Generate a text file with each procedure written per line:
- Generate a PDF file with a call-graph:
Using non-symbolized profiles (requires binary):
- Generate a text file with each procedure written per line:
- Generate a PDF file with a call-graph:
Generating a flame graph
jeprof allows you to generate collapsed stacks for building flame graphs.
You need to use the --collapsed argument:
Or with a non-symbolized profile:
After that, you can use many different tools to visualize collapsed stacks.
The most popular is FlameGraph which contains a script called flamegraph.pl:
Another interesting tool is speedscope that allows you to analyze collected stacks in a more interactive way.
Additional options for the profiler
jemalloc has many different options available, which are related to the profiler. They can be controlled by modifying the MALLOC_CONF environment variable.
For example, the interval between allocation samples can be controlled with lg_prof_sample.
If you want to dump the heap profile every N bytes you can enable it using lg_prof_interval.
It is recommended to check jemallocs reference page for a complete list of options.
Other resources
ClickHouse/Keeper expose jemalloc related metrics in many different ways.
It's important to be aware that none of these metrics are synchronized with each other and values may drift.
System table asynchronous_metrics
System table jemalloc_bins
Contains information about memory allocations done via the jemalloc allocator in different size classes (bins) aggregated from all arenas.
System table jemalloc_stats (26.2+)
Returns the full output of malloc_stats_print() as a single string. Equivalent to the SYSTEM JEMALLOC STATS command.
Prometheus
All jemalloc related metrics from asynchronous_metrics are also exposed using the Prometheus endpoint in both ClickHouse and Keeper.
jmst 4LW command in Keeper
Keeper supports the jmst 4LW command which returns basic allocator statistics: