2023 Changelog
Table of Contents
ClickHouse release v23.12, 2023-12-28
ClickHouse release v23.11, 2023-12-06
ClickHouse release v23.10, 2023-11-02
ClickHouse release v23.9, 2023-09-28
ClickHouse release v23.8 LTS, 2023-08-31
ClickHouse release v23.7, 2023-07-27
ClickHouse release v23.6, 2023-06-30
ClickHouse release v23.5, 2023-06-08
ClickHouse release v23.4, 2023-04-26
ClickHouse release v23.3 LTS, 2023-03-30
ClickHouse release v23.2, 2023-02-23
ClickHouse release v23.1, 2023-01-25
Changelog for 2022
ClickHouse release 23.12, 2023-12-28
Backward Incompatible Change
- Fix check for non-deterministic functions in TTL expressions. Previously, you could create a TTL expression with non-deterministic functions in some cases, which could lead to undefined behavior later. This fixes #37250. Disallow TTL expressions that don't depend on any columns of a table by default. It can be allowed back by
SET allow_suspicious_ttl_expressions = 1
orSET compatibility = '23.11'
. Closes #37286. #51858 (Alexey Milovidov). - The MergeTree setting
clean_deleted_rows
is deprecated, it has no effect anymore. TheCLEANUP
keyword for theOPTIMIZE
is not allowed by default (it can be unlocked with theallow_experimental_replacing_merge_with_cleanup
setting). #58267 (Alexander Tokmakov). This fixes #57930. This closes #54988. This closes #54570. This closes #50346. This closes #47579. The feature has to be removed because it is not good. We have to remove it as quickly as possible, because there is no other option. #57932 (Alexey Milovidov).
New Feature
- Implement Refreshable Materialized Views, requested in #33919. #56946 (Michael Kolupaev, Michael Guzov).
- Introduce
PASTE JOIN
, which allows users to join tables withoutON
clause simply by row numbers. Example:SELECT * FROM (SELECT number AS a FROM numbers(2)) AS t1 PASTE JOIN (SELECT number AS a FROM numbers(2) ORDER BY a DESC) AS t2
. #57995 (Yarik Briukhovetskyi). - The
ORDER BY
clause now supports specifyingALL
, meaning that ClickHouse sorts by all columns in theSELECT
clause. Example:SELECT col1, col2 FROM tab WHERE [...] ORDER BY ALL
. #57875 (zhongyuankai). - Added a new mutation command
ALTER TABLE <table> APPLY DELETED MASK
, which allows to enforce applying of mask written by lightweight delete and to remove rows marked as deleted from disk. #57433 (Anton Popov). - A handler
/binary
opens a visual viewer of symbols inside the ClickHouse binary. #58211 (Alexey Milovidov). - Added a new SQL function
sqid
to generate Sqids (https://sqids.org/), example:SELECT sqid(125, 126)
. #57512 (Robert Schulze). - Add a new function
seriesPeriodDetectFFT
to detect series period using FFT. #57574 (Bhavna Jindal). - Add an HTTP endpoint for checking if Keeper is ready to accept traffic. #55876 (Konstantin Bogdanov).
- Add 'union' mode for schema inference. In this mode the resulting table schema is the union of all files schemas (so schema is inferred from each file). The mode of schema inference is controlled by a setting
schema_inference_mode
with two possible values -default
andunion
. Closes #55428. #55892 (Kruglov Pavel). - Add new setting
input_format_csv_try_infer_numbers_from_strings
that allows to infer numbers from strings in CSV format. Closes #56455. #56859 (Kruglov Pavel). - When the number of databases or tables exceeds a configurable threshold, show a warning to the user. #57375 (凌涛).
- Dictionary with
HASHED_ARRAY
(andCOMPLEX_KEY_HASHED_ARRAY
) layout supportsSHARDS
similarly toHASHED
. #57544 (vdimir). - Add asynchronous metrics for total primary key bytes and total allocated primary key bytes in memory. #57551 (Bharat Nallan).
- Add
SHA512_256
function. #57645 (Bharat Nallan). - Add
FORMAT_BYTES
as an alias forformatReadableSize
. #57592 (Bharat Nallan). - Allow passing optional session token to the
s3
table function. #57850 (Shani Elharrar). - Introduce a new setting
http_make_head_request
. If it is turned off, the URL table engine will not do a HEAD request to determine the file size. This is needed to support inefficient, misconfigured, or not capable HTTP servers. #54602 (Fionera). - It is now possible to refer to ALIAS column in index (non-primary-key) definitions (issue #55650). Example:
CREATE TABLE tab(col UInt32, col_alias ALIAS col + 1, INDEX idx (col_alias) TYPE minmax) ENGINE = MergeTree ORDER BY col;
. #57546 (Robert Schulze). - Added a new setting
readonly
which can be used to specify an S3 disk is read only. It can be useful to create a table on a disk ofs3_plain
type, while having read only access to the underlying S3 bucket. #57977 (Pengyuan Bian). - The primary key analysis in MergeTree tables will now be applied to predicates that include the virtual column
_part_offset
(optionally with_part
). This feature can serve as a special kind of a secondary index. #58224 (Amos Bird).
Performance Improvement
- Extract non-intersecting parts ranges from MergeTree table during FINAL processing. That way we can avoid additional FINAL logic for this non-intersecting parts ranges. In case when amount of duplicate values with same primary key is low, performance will be almost the same as without FINAL. Improve reading performance for MergeTree FINAL when
do_not_merge_across_partitions_select_final
setting is set. #58120 (Maksim Kita). - Made copy between s3 disks using a s3-server-side copy instead of copying through the buffer. Improves
BACKUP/RESTORE
operations andclickhouse-disks copy
command. #56744 (MikhailBurdukov). - Hash JOIN respects setting
max_joined_block_size_rows
and do not produce large blocks forALL JOIN
. #56996 (vdimir). - Release memory for aggregation earlier. This may avoid unnecessary external aggregation. #57691 (Nikolai Kochetov).
- Improve performance of string serialization. #57717 (Maksim Kita).
- Support trivial count optimization for
Merge
-engine tables. #57867 (skyoct). - Optimized aggregation in some cases. #57872 (Anton Popov).
- The
hasAny
function can now take advantage of the full-text skipping indices. #57878 (Jpnock). - Function
if(cond, then, else)
(and its aliascond ? then : else
) were optimized to use branch-free evaluation. #57885 (zhanglistar). - MergeTree automatically derive
do_not_merge_across_partitions_select_final
setting if partition key expression contains only columns from primary key expression. #58218 (Maksim Kita). - Speedup
MIN
andMAX
for native types. #58231 (Raúl Marín). - Implement
SLRU
cache policy for filesystem cache. #57076 (Kseniia Sumarokova). - The limit for the number of connections per endpoint for background fetches was raised from
15
to the value ofbackground_fetches_pool_size
setting. - MergeTree-level settingreplicated_max_parallel_fetches_for_host
became obsolete - MergeTree-level settingsreplicated_fetches_http_connection_timeout
,replicated_fetches_http_send_timeout
andreplicated_fetches_http_receive_timeout
are moved to the Server-level. - Settingkeep_alive_timeout
is added to the list of Server-level settings. #57523 (Nikita Mikhaylov). - Make querying
system.filesystem_cache
not memory intensive. #57687 (Kseniia Sumarokova). - Reduce memory usage on strings deserialization. #57787 (Maksim Kita).
- More efficient constructor for Enum - it makes sense when Enum has a boatload of values. #57887 (Duc Canh Le).
- An improvement for reading from the filesystem cache: always use
pread
method. #57970 (Nikita Taranov). - Add optimization for AND notEquals chain in logical expression optimizer. This optimization is only available with the experimental Analyzer enabled. #58214 (Kevin Mingtarja).
Improvement
- Support for soft memory limit in Keeper. It will refuse requests if the memory usage is close to the maximum. #57271 (Han Fei). #57699 (Han Fei).
- Make inserts into distributed tables handle updated cluster configuration properly. When the list of cluster nodes is dynamically updated, the Directory Monitor of the distribution table will update it. #42826 (zhongyuankai).
- Do not allow creating a replicated table with inconsistent merge parameters. #56833 (Duc Canh Le).
- Show uncompressed size in
system.tables
. #56618. #57186 (Chen Lixiang). - Add
skip_unavailable_shards
as a setting forDistributed
tables that is similar to the corresponding query-level setting. Closes #43666. #57218 (Gagan Goel). - The function
substring
(aliases:substr
,mid
) can now be used withEnum
types. Previously, the first function argument had to be a value of typeString
orFixedString
. This improves compatibility with 3rd party tools such as Tableau via MySQL interface. #57277 (Serge Klochkov). - Function
format
now supports arbitrary argument types (instead of onlyString
andFixedString
arguments). This is important to calculateSELECT format('The {0} to all questions is {1}', 'answer', 42)
. #57549 (Robert Schulze). - Allows to use the
date_trunc
function with a case-insensitive first argument. Both cases are now supported:SELECT date_trunc('day', now())
andSELECT date_trunc('DAY', now())
. #57624 (Yarik Briukhovetskyi). - Better hints when a table doesn't exist. #57342 (Bharat Nallan).
- Allow to overwrite
max_partition_size_to_drop
andmax_table_size_to_drop
server settings in query time. #57452 (Jordi Villar). - Slightly better inference of unnamed tupes in JSON formats. #57751 (Kruglov Pavel).
- Add support for read-only flag when connecting to Keeper (fixes #53749). #57479 (Mikhail Koviazin).
- Fix possible distributed sends stuck due to "No such file or directory" (during recovering a batch from disk). Fix possible issues with
error_count
fromsystem.distribution_queue
(in case ofdistributed_directory_monitor_max_sleep_time_ms
>5min). Introduce profile event to track async INSERT failures -DistributedAsyncInsertionFailures
. #57480 (Azat Khuzhin). - Support PostgreSQL generated columns and default column values in
MaterializedPostgreSQL
(experimental feature). Closes #40449. #57568 (Kseniia Sumarokova). - Allow to apply some filesystem cache config settings changes without server restart. #57578 (Kseniia Sumarokova).
- Properly handling PostgreSQL table structure with empty array. #57618 (Mike Kot).
- Expose the total number of errors occurred since last server restart as a
ClickHouseErrorMetric_ALL
metric. #57627 (Nikita Mikhaylov). - Allow nodes in the configuration file with
from_env
/from_zk
reference and non empty element with replace=1. #57628 (Azat Khuzhin). - A table function
fuzzJSON
which allows generating a lot of malformed JSON for fuzzing. #57646 (Julia Kartseva). - Allow IPv6 to UInt128 conversion and binary arithmetic. #57707 (Yakov Olkhovskiy).
- Add a setting for
async inserts deduplication cache
- how long we wait for cache update. Deprecate settingasync_block_ids_cache_min_update_interval_ms
. Now cache is updated only in case of conflicts. #57743 (alesapin). sleep()
function now can be cancelled withKILL QUERY
. #57746 (Vitaly Baranov).- Forbid
CREATE TABLE ... AS SELECT
queries forReplicated
table engines in the experimentalReplicated
database because they are not supported. Reference #35408. #57796 (Nikolay Degterinsky). - Fix and improve transforming queries for external databases, to recursively obtain all compatible predicates. #57888 (flynn).
- Support dynamic reloading of the filesystem cache size. Closes #57866. #57897 (Kseniia Sumarokova).
- Correctly support
system.stack_trace
for threads with blocked SIGRTMIN (these threads can exist in low-quality external libraries such as Apache rdkafka). #57907 (Azat Khuzhin). Aand also send signal to the threads only if it is not blocked to avoid waitingstorage_system_stack_trace_pipe_read_timeout_ms
when it does not make any sense. #58136 (Azat Khuzhin). - Tolerate keeper failures in the quorum inserts' check. #57986 (Raúl Marín).
- Add max/peak RSS (
MemoryResidentMax
) into system.asynchronous_metrics. #58095 (Azat Khuzhin). - This PR allows users to use s3-style links (
https://
ands3://
) without mentioning region if it's not default. Also find the correct region if the user mentioned the wrong one. #58148 (Yarik Briukhovetskyi). clickhouse-format --obfuscate
will know about Settings, MergeTreeSettings, and time zones and keep their names unchanged. #58179 (Alexey Milovidov).- Added explicit
finalize()
function inZipArchiveWriter
. Simplify too complicated code inZipArchiveWriter
. This fixes #58074. #58202 (Vitaly Baranov). - Make caches with the same path use the same cache objects. This behaviour existed before, but was broken in 23.4. If such caches with the same path have different set of cache settings, an exception will be thrown, that this is not allowed. #58264 (Kseniia Sumarokova).
- Parallel replicas (experimental feature): friendly settings #57542 (Igor Nikonov).
- Parallel replicas (experimental feature): announcement response handling improvement #57749 (Igor Nikonov).
- Parallel replicas (experimental feature): give more respect to
min_number_of_marks
inParallelReplicasReadingCoordinator
#57763 (Nikita Taranov). - Parallel replicas (experimental feature): disable parallel replicas with IN (subquery) #58133 (Igor Nikonov).
- Parallel replicas (experimental feature): add profile event 'ParallelReplicasUsedCount' #58173 (Igor Nikonov).
- Non POST requests such as HEAD will be readonly similar to GET. #58060 (San).
- Add
bytes_uncompressed
column tosystem.part_log
#58167 (Jordi Villar). - Add base backup name to
system.backups
andsystem.backup_log
tables #58178 (Pradeep Chhetri). - Add support for specifying query parameters in the command line in clickhouse-local #58210 (Pradeep Chhetri).
Build/Testing/Packaging Improvement
- Randomize more settings #39663 (Anton Popov).
- Randomize disabled optimizations in CI #57315 (Raúl Marín).
- Allow usage of Azure-related table engines/functions on macOS. #51866 (Alexey Milovidov).
- ClickHouse Fast Test now uses Musl instead of GLibc. #57711 (Alexey Milovidov). The fully-static Musl build is available to download from the CI.
- Run ClickBench for every commit. This closes #57708. #57712 (Alexey Milovidov).
- Remove the usage of a harmful C/POSIX
select
function from external libraries. #57467 (Igor Nikonov). - Settings only available in ClickHouse Cloud will be also present in the open-source ClickHouse build for convenience. #57638 (Nikita Mikhaylov).
Bug Fix (user-visible misbehavior in an official stable release)
- Fixed a possibility of sorting order breakage in TTL GROUP BY #49103 (Nikita Mikhaylov).
- Fix: split
lttb
bucket strategy, first bucket and last bucket should only contain single point #57003 (FFish). - Fix possible deadlock in the
Template
format during sync after error #57004 (Kruglov Pavel). - Fix early stop while parsing a file with skipping lots of errors #57006 (Kruglov Pavel).
- Prevent dictionary's ACL bypass via the
dictionary
table function #57362 (Salvatore Mesoraca). - Fix another case of a "non-ready set" error found by Fuzzer. #57423 (Nikolai Kochetov).
- Fix several issues regarding PostgreSQL
array_ndims
usage. #57436 (Ryan Jacobs). - Fix RWLock inconsistency after write lock timeout #57454 (Vitaly Baranov). Fix RWLock inconsistency after write lock timeout (again) #57733 (Vitaly Baranov).
- Fix: don't exclude ephemeral column when building pushing to view chain #57461 (Yakov Olkhovskiy).
- MaterializedPostgreSQL (experimental issue): fix issue #41922, add test for #41923 #57515 (Kseniia Sumarokova).
- Ignore ON CLUSTER clause in grant/revoke queries for management of replicated access entities. #57538 (MikhailBurdukov).
- Fix crash in clickhouse-local #57553 (Nikolay Degterinsky).
- A fix for Hash JOIN. #57564 (vdimir).
- Fix possible error in PostgreSQL source #57567 (Kseniia Sumarokova).
- Fix type correction in Hash JOIN for nested LowCardinality. #57614 (vdimir).
- Avoid hangs of
system.stack_trace
by correctly prohibiting parallel reading from it. #57641 (Azat Khuzhin). - Fix an error for aggregation of sparse columns with
any(...) RESPECT NULL
#57710 (Azat Khuzhin). - Fix unary operators parsing #57713 (Nikolay Degterinsky).
- Fix dependency loading for the experimental table engine
MaterializedPostgreSQL
. #57754 (Kseniia Sumarokova). - Fix retries for disconnected nodes for BACKUP/RESTORE ON CLUSTER #57764 (Vitaly Baranov).
- Fix result of external aggregation in case of partially materialized projection #57790 (Anton Popov).
- Fix merge in aggregation functions with
*Map
combinator #57795 (Anton Popov). - Disable
system.kafka_consumers
because it has a bug. #57822 (Azat Khuzhin). - Fix LowCardinality keys support in Merge JOIN. #57827 (vdimir).
- A fix for
InterpreterCreateQuery
related to the sample block. #57855 (Maksim Kita). addresses_expr
were ignored for named collections from PostgreSQL. #57874 (joelynch).- Fix invalid memory access in BLAKE3 (Rust) #57876 (Raúl Marín). Then it was rewritten from Rust to C++ for better memory-safety. #57994 (Raúl Marín).
- Normalize function names in
CREATE INDEX
#57906 (Alexander Tokmakov). - Fix handling of unavailable replicas before first request happened #57933 (Nikita Taranov).
- Fix literal alias misclassification #57988 (Chen768959).
- Fix invalid preprocessing on Keeper #58069 (Antonio Andelic).
- Fix integer overflow in the
Poco
library, related toUTF32Encoding
#58073 (Andrey Fedotov). - Fix parallel replicas (experimental feature) in presence of a scalar subquery with a big integer value #58118 (Alexey Milovidov).
- Fix
accurateCastOrNull
for out-of-rangeDateTime
#58139 (Andrey Zvonov). - Fix possible
PARAMETER_OUT_OF_BOUND
error during subcolumns reading from a wide part in MergeTree #58175 (Kruglov Pavel). - Fix a slow-down of CREATE VIEW with an enormous number of subqueries #58220 (Tao Wang).
- Fix parallel parsing for JSONCompactEachRow #58181 (Alexey Milovidov). #58250 (Kruglov Pavel).
ClickHouse release 23.11, 2023-12-06
Backward Incompatible Change
- The default ClickHouse server configuration file has enabled
access_management
(user manipulation by SQL queries) andnamed_collection_control
(manipulation of named collection by SQL queries) for thedefault
user by default. This closes #56482. #56619 (Alexey Milovidov). - Multiple improvements for
RESPECT NULLS
/IGNORE NULLS
for window functions. If you use them as aggregate functions and store the states of aggregate functions with these modifiers, they might become incompatible. #57189 (Raúl Marín). - Remove optimization
optimize_move_functions_out_of_any
. #57190 (Raúl Marín). - Formatters
%l
/%k
/%c
in functionparseDateTime
are now able to parse hours/months without leading zeros, e.g.select parseDateTime('2023-11-26 8:14', '%F %k:%i')
now works. Setparsedatetime_parse_without_leading_zeros = 0
to restore the previous behavior which required two digits. FunctionformatDateTime
is now also able to print hours/months without leading zeros. This is controlled by settingformatdatetime_format_without_leading_zeros
but off by default to not break existing use cases. #55872 (Azat Khuzhin). - You can no longer use the aggregate function
avgWeighted
with arguments of typeDecimal
. Workaround: convert arguments toFloat64
. This closes #43928. This closes #31768. This closes #56435. If you have used this function inside materialized views or projections withDecimal
arguments, contact support@clickhouse.com. Fixed error in aggregate functionsumMap
and made it slower around 1.5..2 times. It does not matter because the function is garbage anyway. This closes #54955. This closes #53134. This closes #55148. Fix a bug in functiongroupArraySample
- it used the same random seed in case more than one aggregate state is generated in a query. #56350 (Alexey Milovidov).
New Feature
- Added server setting
async_load_databases
for asynchronous loading of databases and tables. Speeds up the server start time. Applies to databases withOrdinary
,Atomic
andReplicated
engines. Their tables load metadata asynchronously. Query to a table increases the priority of the load job and waits for it to be done. Added a new tablesystem.asynchronous_loader
for introspection. #49351 (Sergei Trifonov). - Add system table
blob_storage_log
. It allows auditing all the data written to S3 and other object storages. #52918 (vdimir). - Use statistics to order prewhere conditions better. #53240 (Han Fei).
- Added support for compression in the Keeper's protocol. It can be enabled on the ClickHouse side by using this flag
use_compression
insidezookeeper
section. Keep in mind that only ClickHouse Keeper supports compression, while Apache ZooKeeper does not. Resolves #49507. #54957 (SmitaRKulkarni). - Introduce the feature
storage_metadata_write_full_object_key
. If it is set astrue
then metadata files are written with the new format. With that format ClickHouse stores full remote object key in the metadata file which allows better flexibility and optimization. #55566 (Sema Checherinda). - Add new settings and syntax to protect named collections' fields from being overridden. This is meant to prevent a malicious user from obtaining unauthorized access to secrets. #55782 (Salvatore Mesoraca).
- Add
hostname
column to all system log tables - it is useful if you make the system tables replicated, shared, or distributed. #55894 (Bharat Nallan). - Add
CHECK ALL TABLES
query. #56022 (vdimir). - Added function
fromDaysSinceYearZero
which is similar to MySQL'sFROM_DAYS
. E.g.SELECT fromDaysSinceYearZero(739136)
returns2023-09-08
. #56088 (Joanna Hulboj). - Add an external Python tool to view backups and to extract information from them without using ClickHouse. #56268 (Vitaly Baranov).
- Implement a new setting called
preferred_optimize_projection_name
. If it is set to a non-empty string, the specified projection would be used if possible instead of choosing from all the candidates. #56309 (Yarik Briukhovetskyi). - Add 4-letter command for yielding/resigning leadership (https://github.com/ClickHouse/ClickHouse/issues/56352). #56354 (Pradeep Chhetri). #56620 (Pradeep Chhetri).
- Added a new SQL function,
arrayRandomSample(arr, k)
which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g.SELECT arrayReduce('groupArraySample(3)', range(10))
. #56416 (Robert Schulze). - Added support for
Float16
type data to use in.npy
files. Closes #56344. #56424 (Yarik Briukhovetskyi). - Added a system view
information_schema.statistics
for better compatibility with Tableau Online. #56425 (Serge Klochkov). - Add
system.symbols
table useful for introspection of the binary. #56548 (Alexey Milovidov). - Configurable dashboards. Queries for charts are now loaded using a query, which by default uses a new
system.dashboards
table. #56771 (Sergei Trifonov). - Introduce
fileCluster
table function - it is useful if you mount a shared filesystem (NFS and similar) into theuser_files
directory. #56868 (Andrey Zvonov). - Add
_size
virtual column with file size in bytes tos3/file/hdfs/url/azureBlobStorage
engines. #57126 (Kruglov Pavel). - Expose the number of errors for each error code occurred on a server since last restart from the Prometheus endpoint. #57209 (Nikita Mikhaylov).
- ClickHouse keeper reports its running availability zone at
/keeper/availability-zone
path. This can be configured via<availability_zone><value>us-west-1a</value></availability_zone>
. #56715 (Jianfei Hu). - Make ALTER materialized_view MODIFY QUERY non experimental and deprecate
allow_experimental_alter_materialized_view_structure
setting. Fixes #15206. #57311 (alesapin). - Setting
join_algorithm
respects specified order #51745 (vdimir). - Add support for the well-known Protobuf types in the Protobuf format. #56741 (János Benjamin Antal).
Performance Improvement
- Adaptive timeouts for interacting with S3. The first attempt is made with low send and receive timeouts. #56314 (Sema Checherinda).
- Increase the default value of
max_concurrent_queries
from 100 to 1000. This makes sense when there is a large number of connecting clients, which are slowly sending or receiving data, so the server is not limited by CPU, or when the number of CPU cores is larger than 100. Also, enable the concurrency control by default, and set the desired number of query processing threads in total as twice the number of CPU cores. It improves performance in scenarios with a very large number of concurrent queries. #46927 (Alexey Milovidov). - Support parallel evaluation of window functions. Fixes #34688. #39631 (Dmitry Novik).
Numbers
table engine (of thesystem.numbers
table) now analyzes the condition to generate the needed subset of data, like table's index. #50909 (JackyWoo).- Improved the performance of filtering by
IN (...)
condition forMerge
table engine. #54905 (Nikita Taranov). - An improvement which takes place when the filesystem cache is full and there are big reads. #55158 (Kseniia Sumarokova).
- Add ability to disable checksums for S3 to avoid excessive pass over the file (this is controlled by the setting
s3_disable_checksum
). #55559 (Azat Khuzhin). - Now we read synchronously from remote tables when data is in page cache (like we do for local tables). It is faster, it doesn't require synchronisation inside the thread pool, and doesn't hesitate to do
seek
-s on local FS, and reduces CPU wait. #55841 (Nikita Taranov). - Optimization for getting value from
map
,arrayElement
. It will bring about 30% speedup. - reduce the reserved memory - reduce theresize
call. #55957 (lgbo). - Optimization of multi-stage filtering with AVX-512. The performance experiments of the OnTime dataset on the ICX device (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads) show that this change could bring the improvements of 7.4%, 5.9%, 4.7%, 3.0%, and 4.6% to the QPS of the query Q2, Q3, Q4, Q5 and Q6 respectively while having no impact on others. #56079 (Zhiguo Zhou).
- Limit the number of threads busy inside the query profiler. If there are more - they will skip profiling. #56105 (Alexey Milovidov).
- Decrease the amount of virtual function calls in window functions. #56120 (Maksim Kita).
- Allow recursive Tuple field pruning in ORC data format to speed up scaning. #56122 (李扬).
- Trivial count optimization for
Npy
data format: queries likeselect count() from 'data.npy'
will work much more fast because of caching the results. #56304 (Yarik Briukhovetskyi). - Queries with aggregation and a large number of streams will use less amount of memory during the plan's construction. #57074 (Alexey Milovidov).
- Improve performance of executing queries for use cases with many users and highly concurrent queries (>2000 QPS) by optimizing the access to ProcessList. #57106 (Andrej Hoos).
- Trivial improvement on array join, reuse some intermediate results. #57183 (李扬).
- There are cases when stack unwinding was slow. Not anymore. #57221 (Alexey Milovidov).
- Now we use default read pool for reading from external storage when
max_streams = 1
. It is beneficial when read prefetches are enabled. #57334 (Nikita Taranov). - Keeper improvement: improve memory-usage during startup by delaying log preprocessing. #55660 (Antonio Andelic).
- Improved performance of glob matching for
File
andHDFS
storages. #56141 (Andrey Zvonov). - Posting lists in experimental full text indexes are now compressed which reduces their size by 10-30%. #56226 (Harry Lee).
- Parallelise
BackupEntriesCollector
in backups. #56312 (Kseniia Sumarokova).
Improvement
- Add a new
MergeTree
settingadd_implicit_sign_column_constraint_for_collapsing_engine
(disabled by default). When enabled, it adds an implicit CHECK constraint forCollapsingMergeTree
tables that restricts the value of theSign
column to be only -1 or 1. #56701. #56986 (Kevin Mingtarja). - Enable adding new disk to storage configuration without restart. #56367 (Duc Canh Le).
- Support creating and materializing index in the same alter query, also support "modify TTL" and "materialize TTL" in the same query. Closes #55651. #56331 (flynn).
- Add a new table function named
fuzzJSON
with rows containing perturbed versions of the source JSON string with random variations. #56490 (Julia Kartseva). - Engine
Merge
filters the records according to the row policies of the underlying tables, so you don't have to create another row policy on aMerge
table. #50209 (Ilya Golshtein). - Add a setting
max_execution_time_leaf
to limit the execution time on shard for distributed query, andtimeout_overflow_mode_leaf
to control the behaviour if timeout happens. #51823 (Duc Canh Le). - Add ClickHouse setting to disable tunneling for HTTPS requests over HTTP proxy. #55033 (Arthur Passos).
- Set
background_fetches_pool_size
to 16, background_schedule_pool_size to 512 that is better for production usage with frequent small insertions. #54327 (Denny Crane). - While read data from a csv format file, and at end of line is
\r
, which not followed by\n
, then we will enconter the exception as followsCannot parse CSV format: found \r (CR) not followed by \n (LF). Line must end by \n (LF) or \r\n (CR LF) or \n\r.
In clickhouse, the csv end of line must be\n
or\r\n
or\n\r
, so the\r
must be followed by\n
, but in some situation, the csv input data is abnormal, like above,\r
is at end of line. #54340 (KevinyhZou). - Update Arrow library to release-13.0.0 that supports new encodings. Closes #44505. #54800 (Kruglov Pavel).
- Improve performance of ON CLUSTER queries by removing heavy system calls to get all network interfaces when looking for local ip address in the DDL entry hosts list. #54909 (Duc Canh Le).
- Fixed accounting of memory allocated before attaching a thread to a query or a user. #56089 (Nikita Taranov).
- Add support for
LARGE_LIST
in Apache Arrow formats. #56118 (edef). - Allow manual compaction of
EmbeddedRocksDB
viaOPTIMIZE
query. #56225 (Azat Khuzhin). - Add ability to specify BlockBasedTableOptions for
EmbeddedRocksDB
tables. #56264 (Azat Khuzhin). SHOW COLUMNS
now displays MySQL's equivalent data type name when the connection was made through the MySQL protocol. Previously, this was the case when settinguse_mysql_types_in_show_columns = 1
. The setting is retained but made obsolete. #56277 (Robert Schulze).- Fixed possible
The local set of parts of table doesn't look like the set of parts in ZooKeeper
error if server was restarted just afterTRUNCATE
orDROP PARTITION
. #56282 (Alexander Tokmakov). - Fixed handling of non-const query strings in functions
formatQuery
/formatQuerySingleLine
. Also addedOrNull
variants of both functions that return a NULL when a query cannot be parsed instead of throwing an exception. #56327 (Robert Schulze). - Allow backup of materialized view with dropped inner table instead of failing the backup. #56387 (Kseniia Sumarokova).
- Queries to
system.replicas
initiate requests to ZooKeeper when certain columns are queried. When there are thousands of tables these requests might produce a considerable load on ZooKeeper. If there are multiple simultaneous queries tosystem.replicas
they do same requests multiple times. The change is to "deduplicate" requests from concurrent queries. #56420 (Alexander Gololobov). - Fix translation to MySQL compatible query for querying external databases. #56456 (flynn).
- Add support for backing up and restoring tables using
KeeperMap
engine. #56460 (Antonio Andelic). - 404 response for CompleteMultipartUpload has to be rechecked. Operation could be done on server even if client got timeout or other network errors. The next retry of CompleteMultipartUpload receives 404 response. If the object key exists that operation is considered as successful. #56475 (Sema Checherinda).
- Enable the HTTP OPTIONS method by default - it simplifies requesting ClickHouse from a web browser. #56483 (Alexey Milovidov).
- The value for
dns_max_consecutive_failures
was changed by mistake in #46550 - this is reverted and adjusted to a better value. Also, increased the HTTP keep-alive timeout to a reasonable value from production. #56485 (Alexey Milovidov). - Load base backups lazily (a base backup won't be loaded until it's needed). Also add some log message and profile events for backups. #56516 (Vitaly Baranov).
- Setting
query_cache_store_results_of_queries_with_nondeterministic_functions
(with valuesfalse
ortrue
) was marked obsolete. It was replaced by settingquery_cache_nondeterministic_function_handling
, a three-valued enum that controls how the query cache handles queries with non-deterministic functions: a) throw an exception (default behavior), b) save the non-deterministic query result regardless, or c) ignore, i.e. don't throw an exception and don't cache the result. #56519 (Robert Schulze). - Rewrite equality with
is null
check in JOIN ON section. Experimental Analyzer only. #56538 (vdimir). - Function
concat
now supports arbitrary argument types (instead of only String and FixedString arguments). This makes it behave more similar to MySQLconcat
implementation. For example,SELECT concat('ab', 42)
now returnsab42
. #56540 (Serge Klochkov). - Allow getting cache configuration from 'named_collection' section in config or from SQL created named collections. #56541 (Kseniia Sumarokova).
- PostgreSQL database engine: Make the removal of outdated tables less aggressive with unsuccessful postgres connection. #56609 (jsc0218).
- It took too much time to connnect to PG when URL is not right, so the relevant query stucks there and get cancelled. #56648 (jsc0218).
- Keeper improvement: disable compressed logs by default in Keeper. #56763 (Antonio Andelic).
- Add config setting
wait_dictionaries_load_at_startup
. #56782 (Vitaly Baranov). - There was a potential vulnerability in previous ClickHouse versions: if a user has connected and unsuccessfully tried to authenticate with the "interserver secret" method, the server didn't terminate the connection immediately but continued to receive and ignore the leftover packets from the client. While these packets are ignored, they are still parsed, and if they use a compression method with another known vulnerability, it will lead to exploitation of it without authentication. This issue was found with ClickHouse Bug Bounty Program by https://twitter.com/malacupa. #56794 (Alexey Milovidov).
- Fetching a part waits when that part is fully committed on remote replica. It is better not send part in PreActive state. In case of zero copy this is mandatory restriction. #56808 (Sema Checherinda).
- Fix possible postgresql logical replication conversion error when using experimental
MaterializedPostgreSQL
. #53721 (takakawa). - Implement user-level setting
alter_move_to_space_execute_async
which allow to execute queriesALTER TABLE ... MOVE PARTITION|PART TO DISK|VOLUME
asynchronously. The size of pool for background executions is controlled bybackground_move_pool_size
. Default behavior is synchronous execution. Fixes #47643. #56809 (alesapin). - Able to filter by engine when scanning system.tables, avoid unnecessary (potentially time-consuming) connection. #56813 (jsc0218).
- Show
total_bytes
andtotal_rows
in system tables for RocksDB storage. #56816 (Aleksandr Musorin). - Allow basic commands in ALTER for TEMPORARY tables. #56892 (Sergey).
- LZ4 compression. Buffer compressed block in a rare case when out buffer capacity is not enough for writing compressed block directly to out's buffer. #56938 (Sema Checherinda).
- Add metrics for the number of queued jobs, which is useful for the IO thread pool. #56958 (Alexey Milovidov).
- Add a setting for PostgreSQL table engine setting in the config file. Added a check for the setting Added documentation around the additional setting. #56959 (Peignon Melvyn).
- Function
concat
can now be called with a single argument, e.g.,SELECT concat('abc')
. This makes its behavior more consistent with MySQL's concat implementation. #57000 (Serge Klochkov). - Signs all
x-amz-*
headers as required by AWS S3 docs. #57001 (Arthur Passos). - Function
fromDaysSinceYearZero
(alias:FROM_DAYS
) can now be used with unsigned and signed integer types (previously, it had to be an unsigned integer). This improve compatibility with 3rd party tools such as Tableau Online. #57002 (Serge Klochkov). - Add
system.s3queue_log
to default config. #57036 (Kseniia Sumarokova). - Change the default for
wait_dictionaries_load_at_startup
to true, and use this setting only ifdictionaries_lazy_load
is false. #57133 (Vitaly Baranov). - Check dictionary source type on creation even if
dictionaries_lazy_load
is enabled. #57134 (Vitaly Baranov). - Plan-level optimizations can now be enabled/disabled individually. Previously, it was only possible to disable them all. The setting which previously did that (
query_plan_enable_optimizations
) is retained and can still be used to disable all optimizations. #57152 (Robert Schulze). - The server's exit code will correspond to the exception code. For example, if the server cannot start due to memory limit, it will exit with the code 241 = MEMORY_LIMIT_EXCEEDED. In previous versions, the exit code for exceptions was always 70 = Poco::Util::ExitCode::EXIT_SOFTWARE. #57153 (Alexey Milovidov).
- Do not demangle and symbolize stack frames from
functional
C++ header. #57201 (Mike Kot). - HTTP server page
/dashboard
now supports charts with multiple lines. #57236 (Sergei Trifonov). - The
max_memory_usage_in_client
command line option supports a string value with a suffix (K, M, G, etc). Closes #56879. #57273 (Yarik Briukhovetskyi). - Bumped Intel QPL (used by codec
DEFLATE_QPL
) from v1.2.0 to v1.3.1 . Also fixed a bug in case of BOF (Block On Fault) = 0, changed to handle page faults by falling back to SW path. #57291 (jasperzhu). - Increase default
replicated_deduplication_window
of MergeTree settings from 100 to 1k. #57335 (sichenzhao). - Stop using
INCONSISTENT_METADATA_FOR_BACKUP
that much. If possible prefer to continue scanning instead of stopping and starting the scanning for backup from the beginning. #57385 (Vitaly Baranov).
Build/Testing/Packaging Improvement
- Add SQLLogic test. #56078 (Han Fei).
- Make
clickhouse-local
andclickhouse-client
available under short names (ch
,chl
,chc
) for usability. #56634 (Alexey Milovidov). - Optimized build size further by removing unused code from external libraries. #56786 (Alexey Milovidov).
- Add automatic check that there are no large translation units. #56559 (Alexey Milovidov).
- Lower the size of the single-binary distribution. This closes #55181. #56617 (Alexey Milovidov).
- Information about the sizes of every translation unit and binary file after each build will be sent to the CI database in ClickHouse Cloud. This closes #56107. #56636 (Alexey Milovidov).
- Certain files of "Apache Arrow" library (which we use only for non-essential things like parsing the arrow format) were rebuilt all the time regardless of the build cache. This is fixed. #56657 (Alexey Milovidov).
- Avoid recompiling translation units depending on the autogenerated source file about version. #56660 (Alexey Milovidov).
- Tracing data of the linker invocations will be sent to the CI database in ClickHouse Cloud. #56725 (Alexey Milovidov).
- Use DWARF 5 debug symbols for the clickhouse binary (was DWARF 4 previously). #56770 (Michael Kolupaev).
- Add a new build option
SANITIZE_COVERAGE
. If it is enabled, the code is instrumented to track the coverage. The collected information is available inside ClickHouse with: (1) a new functioncoverage
that returns an array of unique addresses in the code found after the previous coverage reset; (2)SYSTEM RESET COVERAGE
query that resets the accumulated data. This allows us to compare the coverage of different tests, including differential code coverage. Continuation of #20539. #56102 (Alexey Milovidov). - Some of the stack frames might not be resolved when collecting stacks. In such cases the raw address might be helpful. #56267 (Alexander Gololobov).
- Add an option to disable
libssh
. #56333 (Alexey Milovidov). - Enable temporary_data_in_cache in S3 tests in CI. #48425 (vdimir).
- Set the max memory usage for clickhouse-client (
1G
) in the CI. #56873 (Nikita Mikhaylov).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix exerimental Analyzer - insertion from select with subquery referencing insertion table should process only insertion block. #50857 (Yakov Olkhovskiy).
- Fix a bug in
str_to_map
function. #56423 (Arthur Passos). - Keeper
reconfig
: add timeout before yielding/taking leadership #53481 (Mike Kot). - Fix incorrect header in grace hash join and filter pushdown #53922 (vdimir).
- Select from system tables when table based on table function. #55540 (MikhailBurdukov).
- RFC: Fix "Cannot find column X in source stream" for Distributed queries with LIMIT BY #55836 (Azat Khuzhin).
- Fix 'Cannot read from file:' while running client in a background #55976 (Kruglov Pavel).
- Fix clickhouse-local exit on bad send_logs_level setting #55994 (Kruglov Pavel).
- Bug fix explain ast with parameterized view #56004 (SmitaRKulkarni).
- Fix a crash during table loading on startup #56232 (Nikolay Degterinsky).
- Fix ClickHouse-sourced dictionaries with an explicit query #56236 (Nikolay Degterinsky).
- Fix segfault in signal handler for Keeper #56266 (Antonio Andelic).
- Fix incomplete query result for UNION in view() function. #56274 (Nikolai Kochetov).
- Fix inconsistency of "cast('0' as DateTime64(3))" and "cast('0' as Nullable(DateTime64(3)))" #56286 (李扬).
- Fix rare race condition related to Memory allocation failure #56303 (alesapin).
- Fix restore from backup with
flatten_nested
anddata_type_default_nullable
#56306 (Kseniia Sumarokova). - Fix crash in case of adding a column with type Object(JSON) #56307 (Nikita Mikhaylov).
- Fix crash in filterPushDown #56380 (vdimir).
- Fix restore from backup with mat view and dropped source table #56383 (Kseniia Sumarokova).
- Fix segfault during Kerberos initialization #56401 (Nikolay Degterinsky).
- Fix buffer overflow in T64 #56434 (Alexey Milovidov).
- Fix nullable primary key in final (2) #56452 (Amos Bird).
- Fix ON CLUSTER queries without database on initial node #56484 (Nikolay Degterinsky).
- Fix startup failure due to TTL dependency #56489 (Nikolay Degterinsky).
- Fix ALTER COMMENT queries ON CLUSTER #56491 (Nikolay Degterinsky).
- Fix ALTER COLUMN with ALIAS #56493 (Nikolay Degterinsky).
- Fix empty NAMED COLLECTIONs #56494 (Nikolay Degterinsky).
- Fix two cases of projection analysis. #56502 (Amos Bird).
- Fix handling of aliases in query cache #56545 (Robert Schulze).
- Fix conversion from
Nullable(Enum)
toNullable(String)
#56644 (Nikolay Degterinsky). - More reliable log handling in Keeper #56670 (Antonio Andelic).
- Fix configuration merge for nodes with substitution attributes #56694 (Konstantin Bogdanov).
- Fix duplicate usage of table function input(). #56695 (Nikolai Kochetov).
- Fix: RabbitMQ OpenSSL dynamic loading issue #56703 (Igor Nikonov).
- Fix crash in GCD codec in case when zeros present in data #56704 (Nikita Mikhaylov).
- Fix 'mutex lock failed: Invalid argument' in clickhouse-local during insert into function #56710 (Kruglov Pavel).
- Fix Date text parsing in optimistic path #56765 (Kruglov Pavel).
- Fix crash in FPC codec #56795 (Alexey Milovidov).
- DatabaseReplicated: fix DDL query timeout after recovering a replica #56796 (Alexander Tokmakov).
- Fix incorrect nullable columns reporting in MySQL binary protocol #56799 (Serge Klochkov).
- Support Iceberg metadata files for metastore tables #56810 (Kruglov Pavel).
- Fix TSAN report under transform #56817 (Raúl Marín).
- Fix SET query and SETTINGS formatting #56825 (Nikolay Degterinsky).
- Fix failure to start due to table dependency in joinGet #56828 (Nikolay Degterinsky).
- Fix flattening existing Nested columns during ADD COLUMN #56830 (Nikolay Degterinsky).
- Fix allow cr end of line for csv #56901 (KevinyhZou).
- Fix
tryBase64Decode
with invalid input #56913 (Robert Schulze). - Fix generating deep nested columns in CapnProto/Protobuf schemas #56941 (Kruglov Pavel).
- Prevent incompatible ALTER of projection columns #56948 (Amos Bird).
- Fix sqlite file path validation #56984 (San).
- S3Queue: fix metadata reference increment #56990 (Kseniia Sumarokova).
- S3Queue minor fix #56999 (Kseniia Sumarokova).
- Fix file path validation for DatabaseFileSystem #57029 (San).
- Fix
fuzzBits
withARRAY JOIN
#57033 (Antonio Andelic). - Fix Nullptr dereference in partial merge join with joined_subquery_re… #57048 (vdimir).
- Fix race condition in RemoteSource #57052 (Raúl Marín).
- Implement
bitHammingDistance
for big integers #57073 (Alexey Milovidov). - S3-style links bug fix #57075 (Yarik Briukhovetskyi).
- Fix JSON_QUERY function with multiple numeric paths #57096 (KevinyhZou).
- Fix buffer overflow in Gorilla codec #57107 (Nikolay Degterinsky).
- Close interserver connection on any exception before authentication #57142 (Antonio Andelic).
- Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column #57147 (Nikolay Degterinsky).
- Fix incorrect JOIN plan optimization with partially materialized normal projection #57196 (Amos Bird).
- Ignore comments when comparing column descriptions #57259 (Antonio Andelic).
- Fix
ReadonlyReplica
metric for all cases #57267 (Antonio Andelic). - Background merges correctly use temporary data storage in the cache #57275 (vdimir).
- Keeper fix for changelog and snapshots #57299 (Antonio Andelic).
- Ignore finished ON CLUSTER tasks if hostname changed #57339 (Alexander Tokmakov).
- MergeTree mutations reuse source part index granularity #57352 (Maksim Kita).
- FS cache: add a limit for background download #57424 (Kseniia Sumarokova).
ClickHouse release 23.10, 2023-11-02
Backward Incompatible Change
- There is no longer an option to automatically remove broken data parts. This closes #55174. #55184 (Alexey Milovidov). #55557 (Jihyuk Bok).
- The obsolete in-memory data parts can no longer be read from the write-ahead log. If you have configured in-memory parts before, they have to be removed before the upgrade. #55186 (Alexey Milovidov).
- Remove the integration with Meilisearch. Reason: it was compatible only with the old version 0.18. The recent version of Meilisearch changed the protocol and does not work anymore. Note: we would appreciate it if you help to return it back. #55189 (Alexey Milovidov).
- Rename directory monitor concept into background INSERT. All the settings
*directory_monitor*
had been renamed todistributed_background_insert*
. Backward compatibility should be preserved (since old settings had been added as an alias). #55978 (Azat Khuzhin). - Do not interpret the
send_timeout
set on the client side as thereceive_timeout
on the server side and vise-versa. #56035 (Azat Khuzhin). - Comparison of time intervals with different units will throw an exception. This closes #55942. You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. #56090 (Alexey Milovidov).
- Rewrited the experimental
S3Queue
table engine completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Addedsystem.s3queue
andsystem.s3queue_log
tables. Closes #54998. #54422 (Kseniia Sumarokova). - Arbitrary paths on HTTP endpoint are no longer interpreted as a request to the
/query
endpoint. #55521 (Konstantin Bogdanov).
New Feature
- Add function
arrayFold(accumulator, x1, ..., xn -> expression, initial, array1, ..., arrayn)
which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator. #49794 (Lirikl). - Support for
Npy
format.SELECT * FROM file('example_array.npy', Npy)
. #55982 (Yarik Briukhovetskyi). - If a table has a space-filling curve in its key, e.g.,
ORDER BY mortonEncode(x, y)
, the conditions on its arguments, e.g.,x >= 10 AND x <= 20 AND y >= 20 AND y <= 30
can be used for indexing. A settinganalyze_index_with_space_filling_curves
is added to enable or disable this analysis. This closes #41195. Continuation of #4538. Continuation of #6286. Continuation of #28130. Continuation of #41753. #55642 (Alexey Milovidov). - A new setting called
force_optimize_projection_name
, it takes a name of projection as an argument. If it's value set to a non-empty string, ClickHouse checks that this projection is used in the query at least once. Closes #55331. #56134 (Yarik Briukhovetskyi). - Support asynchronous inserts with external data via native protocol. Previously it worked only if data is inlined into query. #54730 (Anton Popov).
- Added aggregation function
lttb
which uses the Largest-Triangle-Three-Buckets algorithm for downsampling data for visualization. #53145 (Sinan). - Query
CHECK TABLE
has better performance and usability (sends progress updates, cancellable). Support checking particular part withCHECK TABLE ... PART 'part_name'
. #53404 (vdimir). - Added function
jsonMergePatch
. When working with JSON data as strings, it provides a way to merge these strings (of JSON objects) together to form a single string containing a single JSON object. #54364 (Memo). - The second part of Kusto Query Language dialect support. Phase 1 implementation has been merged. #42510 (larryluogit).
- Added a new SQL function,
arrayRandomSample(arr, k)
which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g. "SELECT arrayReduce('groupArraySample(3)', range(10))". #54391 (itayisraelov). - Introduce
-ArgMin
/-ArgMax
aggregate combinators which allow to aggregate by min/max values only. One use case can be found in #54818. This PR also reorganize combinators into dedicated folder. #54947 (Amos Bird). - Allow to drop cache for Protobuf format with
SYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]
. #55064 (Aleksandr Musorin). - Add external HTTP Basic authenticator. #55199 (Aleksei Filatov).
- Added function
byteSwap
which reverses the bytes of unsigned integers. This is particularly useful for reversing values of types which are represented as unsigned integers internally such as IPv4. #55211 (Priyansh Agrawal). - Added function
formatQuery
which returns a formatted version (possibly spanning multiple lines) of a SQL query string. Also added functionformatQuerySingleLine
which does the same but the returned string will not contain linebreaks. #55239 (Salvatore Mesoraca). - Added
DWARF
input format that reads debug symbols from an ELF executable/library/object file. #55450 (Michael Kolupaev). - Allow to save unparsed records and errors in RabbitMQ, NATS and FileLog engines. Add virtual columns
_error
and_raw_message
(for NATS and RabbitMQ),_raw_record
(for FileLog) that are filled when ClickHouse fails to parse new record. The behaviour is controlled under storage settingsnats_handle_error_mode
for NATS,rabbitmq_handle_error_mode
for RabbitMQ,handle_error_mode
for FileLog similar tokafka_handle_error_mode
. If it's set todefault
, en exception will be thrown when ClickHouse fails to parse a record, if it's set tostream
, erorr and raw record will be saved into virtual columns. Closes #36035. #55477 (Kruglov Pavel). - Keeper client improvement: add
get_all_children_number command
that returns number of all children nodes under a specific path. #55485 (guoxiaolong). - Keeper client improvement: add
get_direct_children_number
command that returns number of direct children nodes under a path. #55898 (xuzifu666). - Add statement
SHOW SETTING setting_name
which is a simpler version of existing statementSHOW SETTINGS
. #55979 (Maksim Kita). - Added fields
substreams
andfilenames
to thesystem.parts_columns
table. #55108 (Anton Popov). - Add support for
SHOW MERGES
query. #55815 (megao). - Introduce a setting
create_table_empty_primary_key_by_default
for defaultORDER BY ()
. #55899 (Srikanth Chekuri).
Performance Improvement
- Add option
query_plan_preserve_num_streams_after_window_functions
to preserve the number of streams after evaluating window functions to allow parallel stream processing. #50771 (frinkr). - Release more streams if data is small. #53867 (Jiebin Sun).
- RoaringBitmaps being optimized before serialization. #55044 (UnamedRus).
- Posting lists in inverted indexes are now optimized to use the smallest possible representation for internal bitmaps. Depending on the repetitiveness of the data, this may significantly reduce the space consumption of inverted indexes. #55069 (Harry Lee).
- Fix contention on Context lock, this significantly improves performance for a lot of short-running concurrent queries. #55121 (Maksim Kita).
- Improved the performance of inverted index creation by 30%. This was achieved by replacing
std::unordered_map
withabsl::flat_hash_map
. #55210 (Harry Lee). - Support ORC filter push down (rowgroup level). #55330 (李扬).
- Improve performance of external aggregation with a lot of temporary files. #55489 (Maksim Kita).
- Set a reasonable size for the marks cache for secondary indices by default to avoid loading the marks over and over again. #55654 (Alexey Milovidov).
- Avoid unnecessary reconstruction of index granules when reading skip indexes. This addresses #55653. #55683 (Amos Bird).
- Cache CAST function in set during execution to improve the performance of function
IN
when set element type doesn't exactly match column type. #55712 (Duc Canh Le). - Performance improvement for
ColumnVector::insertMany
andColumnVector::insertManyFrom
. #55714 (frinkr). - Optimized Map subscript operations by predicting the next row's key position and reduce the comparisons. #55929 (lgbo).
- Support struct fields pruning in Parquet (in previous versions it didn't work in some cases). #56117 (lgbo).
- Add the ability to tune the number of parallel replicas used in a query execution based on the estimation of rows to read. #51692 (Raúl Marín).
- Optimized external aggregation memory consumption in case many temporary files were generated. #54798 (Nikita Taranov).
- Distributed queries executed in
async_socket_for_remote
mode (default) now respectmax_threads
limit. Previously, some queries could create excessive threads (up tomax_distributed_connections
), causing server performance issues. #53504 (filimonov). - Caching skip-able entries while executing DDL from Zookeeper distributed DDL queue. #54828 (Duc Canh Le).
- Experimental inverted indexes do not store tokens with too many matches (i.e. row ids in the posting list). This saves space and avoids ineffective index lookups when sequential scans would be equally fast or faster. The previous heuristics (
density
parameter passed to the index definition) that controlled when tokens would not be stored was too confusing for users. A much simpler heuristics based on parametermax_rows_per_postings_list
(default: 64k) is introduced which directly controls the maximum allowed number of row ids in a postings list. #55616 (Harry Lee). - Improve write performance to
EmbeddedRocksDB
tables. #55732 (Duc Canh Le). - Improved overall resilience for ClickHouse in case of many parts within partition (more than 1000). It might reduce the number of
TOO_MANY_PARTS
errors. #55526 (Nikita Mikhaylov). - Reduced memory consumption during loading of hierarchical dictionaries. #55838 (Nikita Taranov).
- All dictionaries support setting
dictionary_use_async_executor
. #55839 (vdimir). - Prevent excesive memory usage when deserializing AggregateFunctionTopKGenericData. #55947 (Raúl Marín).
- On a Keeper with lots of watches AsyncMetrics threads can consume 100% of CPU for noticable time in
DB::KeeperStorage::getSessionsWithWatchesCount
. The fix is to avoid traversing heavywatches
andlist_watches
sets. #56054 (Alexander Gololobov). - Add setting
optimize_trivial_approximate_count_query
to usecount
approximation for storage EmbeddedRocksDB. Enable trivial count for StorageJoin. #55806 (Duc Canh Le).
Improvement
- Functions
toDayOfWeek
(MySQL alias:DAYOFWEEK
),toYearWeek
(YEARWEEK
) andtoWeek
(WEEK
) now supportsString
arguments. This makes its behavior consistent with MySQL's behavior. #55589 (Robert Schulze). - Introduced setting
date_time_overflow_behavior
with possible valuesignore
,throw
,saturate
that controls the overflow behavior when converting from Date, Date32, DateTime64, Integer or Float to Date, Date32, DateTime or DateTime64. #55696 (Andrey Zvonov). - Implement query parameters support for
ALTER TABLE ... ACTION PARTITION [ID] {parameter_name:ParameterType}
. Merges #49516. Closes #49449. #55604 (alesapin). - Print processor ids in a prettier manner in EXPLAIN. #48852 (Vlad Seliverstov).
- Creating a direct dictionary with a lifetime field will be rejected at create time (as the lifetime does not make sense for direct dictionaries). Fixes: #27861. #49043 (Rory Crispin).
- Allow parameters in queries with partitions like
ALTER TABLE t DROP PARTITION
. Closes #49449. #49516 (Nikolay Degterinsky). - Add a new column
xid
forsystem.zookeeper_connection
. #50702 (helifu). - Display the correct server settings in
system.server_settings
after configuration reload. #53774 (helifu). - Add support for mathematical minus
−
character in queries, similar to-
. #54100 (Alexey Milovidov). - Add replica groups to the experimental
Replicated
database engine. Closes #53620. #54421 (Nikolay Degterinsky). - It is better to retry retriable s3 errors than totally fail the query. Set bigger value to the s3_retry_attempts by default. #54770 (Sema Checherinda).
- Add load balancing mode
hostname_levenshtein_distance
. #54826 (JackyWoo). - Improve hiding secrets in logs. #55089 (Vitaly Baranov).
- For now the projection analysis will be performed only on top of query plan. The setting
query_plan_optimize_projection
became obsolete (it was enabled by default long time ago). #55112 (Nikita Mikhaylov). - When function
untuple
is now called on a tuple with named elements and itself has an alias (e.g.select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias
), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias"). #55123 (garcher22). - Added setting
describe_include_virtual_columns
, which allows to include virtual columns of table into result ofDESCRIBE
query. Added settingdescribe_compact_output
. If it is set totrue
,DESCRIBE
query returns only names and types of columns without extra information. #55129 (Anton Popov). - Sometimes
OPTIMIZE
withoptimize_throw_if_noop=1
may fail with an errorunknown reason
while the real cause of it - different projections in different parts. This behavior is fixed. #55130 (Nikita Mikhaylov). - Allow to have several
MaterializedPostgreSQL
tables following the same Postgres table. By default this behaviour is not enabled (for compatibility, because it is a backward-incompatible change), but can be turned on with settingmaterialized_postgresql_use_unique_replication_consumer_identifier
. Closes #54918. #55145 (Kseniia Sumarokova). - Allow to parse negative
DateTime64
andDateTime
with fractional part from short strings. #55146 (Andrey Zvonov). - To improve compatibility with MySQL, 1.
information_schema.tables
now includes the new fieldtable_rows
, and 2.information_schema.columns
now includes the new fieldextra
. #55215 (Robert Schulze). - Clickhouse-client won't show "0 rows in set" if it is zero and if exception was thrown. #55240 (Salvatore Mesoraca).
- Support rename table without keyword
TABLE
likeRENAME db.t1 to db.t2
. #55373 (凌涛). - Add
internal_replication
tosystem.clusters
. #55377 (Konstantin Morozov). - Select remote proxy resolver based on request protocol, add proxy feature docs and remove
DB::ProxyConfiguration::Protocol::ANY
. #55430 (Arthur Passos). - Avoid retrying keeper operations on INSERT after table shutdown. #55519 (Azat Khuzhin).
SHOW COLUMNS
now correctly reports typeFixedString
asBLOB
if settinguse_mysql_types_in_show_columns
is on. Also added two new settings,mysql_map_string_to_text_in_show_columns
andmysql_map_fixed_string_to_text_in_show_columns
to switch the output for typesString
andFixedString
asTEXT
orBLOB
. #55617 (Serge Klochkov).- During ReplicatedMergeTree tables startup clickhouse server checks set of parts for unexpected parts (exists locally, but not in zookeeper). All unexpected parts move to detached directory and instead of them server tries to restore some ancestor (covered) parts. Now server tries to restore closest ancestors instead of random covered parts. #55645 (alesapin).
- The advanced dashboard now supports draggable charts on touch devices. This closes #54206. #55649 (Alexey Milovidov).
- Use the default query format if declared when outputting exception with
http_write_exception_in_output_format
. #55739 (Raúl Marín). - Provide a better message for common MATERIALIZED VIEW pitfalls. #55826 (Raúl Marín).
- If you dropped the current database, you will still be able to run some queries in
clickhouse-local
and switch to another database. This makes the behavior consistent withclickhouse-client
. This closes #55834. #55853 (Alexey Milovidov). - Functions
(add|subtract)(Year|Quarter|Month|Week|Day|Hour|Minute|Second|Millisecond|Microsecond|Nanosecond)
now support string-encoded date arguments, e.g.SELECT addDays('2023-10-22', 1)
. This increases compatibility with MySQL and is needed by Tableau Online. #55869 (Robert Schulze). - The setting
apply_deleted_mask
when disabled allows to read rows that where marked as deleted by lightweight DELETE queries. This is useful for debugging. #55952 (Alexander Gololobov). - Allow skipping
null
values when serailizing Tuple to json objects, which makes it possible to keep compatibility with Spark'sto_json
function, which is also useful for gluten. #55956 (李扬). - Functions
(add|sub)Date
now support string-encoded date arguments, e.g.SELECT addDate('2023-10-22 11:12:13', INTERVAL 5 MINUTE)
. The same support for string-encoded date arguments is added to the plus and minus operators, e.g.SELECT '2023-10-23' + INTERVAL 1 DAY
. This increases compatibility with MySQL and is needed by Tableau Online. #55960 (Robert Schulze). - Allow unquoted strings with CR (
\r
) in CSV format. Closes #39930. #56046 (Kruglov Pavel). - Allow to run
clickhouse-keeper
using embedded config. #56086 (Maksim Kita). - Set limit of the maximum configuration value for
queued.min.messages
to avoid problem with start fetching data with Kafka. #56121 (Stas Morozov). - Fixed a typo in SQL function
minSampleSizeContinous
(renamedminSampleSizeContinuous
). Old name is preserved for backward compatibility. This closes: #56139. #56143 (Dorota Szeremeta). - Print path for broken parts on disk before shutting down the server. Before this change if a part is corrupted on disk and server cannot start, it was almost impossible to understand which part is broken. This is fixed. #56181 (Duc Canh Le).
Build/Testing/Packaging Improvement
- If the database in Docker is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). #50724 (Alexander Nikolaev).
- Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checking out the submodules. #51435 (Ilya Yatsishin).
- An error was occuring when building ClickHouse with the AVX series of instructions enabled globally (which isn't recommended). The reason is that snappy does not enable
SNAPPY_HAVE_X86_CRC32
. #55049 (monchickey). - Solve issue with launching standalone
clickhouse-keeper
fromclickhouse-server
package. #55226 (Mikhail f. Shiryaev). - In the tests, RabbitMQ version is updated to 3.12.6. Improved logs collection for RabbitMQ tests. #55424 (Ilya Yatsishin).
- Modified the error message difference between openssl and boringssl to fix the functional test. #55975 (MeenaRenganathan22).
- Use upstream repo for apache datasketches. #55787 (Nikita Taranov).
Bug Fix (user-visible misbehavior in an official stable release)
- Skip hardlinking inverted index files in mutation #47663 (cangyin).
- Fixed bug of
match
function (regex) with pattern containing alternation produces incorrect key condition. Closes #53222. #54696 (Yakov Olkhovskiy). - Fix 'Cannot find column' in read-in-order optimization with ARRAY JOIN #51746 (Nikolai Kochetov).
- Support missed experimental
Object(Nullable(json))
subcolumns in query. #54052 (zps). - Re-add fix for
accurateCastOrNull
#54629 (Salvatore Mesoraca). - Fix detecting
DEFAULT
for columns of a Distributed table created without AS #55060 (Vitaly Baranov). - Proper cleanup in case of exception in ctor of ShellCommandSource #55103 (Alexander Gololobov).
- Fix deadlock in LDAP assigned role update #55119 (Julian Maicher).
- Suppress error statistics update for internal exceptions #55128 (Robert Schulze).
- Fix deadlock in backups #55132 (alesapin).
- Fix storage Iceberg files retrieval #55144 (Kseniia Sumarokova).
- Fix partition pruning of extra columns in set. #55172 (Amos Bird).
- Fix recalculation of skip indexes in ALTER UPDATE queries when table has adaptive granularity #55202 (Duc Canh Le).
- Fix for background download in fs cache #55252 (Kseniia Sumarokova).
- Avoid possible memory leaks in compressors in case of missing buffer finalization #55262 (Azat Khuzhin).
- Fix functions execution over sparse columns #55275 (Azat Khuzhin).
- Fix incorrect merging of Nested for SELECT FINAL FROM SummingMergeTree #55276 (Azat Khuzhin).
- Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy #55309 (alesapin).
- Fix a crash in MergeSortingPartialResultTransform (due to zero chunks after
remerge
) #55335 (Azat Khuzhin). - Fix data-race in CreatingSetsTransform (on errors) due to throwing shared exception #55338 (Azat Khuzhin).
- Fix trash optimization (up to a certain extent) #55353 (Alexey Milovidov).
- Fix leak in StorageHDFS #55370 (Azat Khuzhin).
- Fix parsing of arrays in cast operator #55417 (Anton Popov).
- Fix filtering by virtual columns with OR filter in query #55418 (Azat Khuzhin).
- Fix MongoDB connection issues #55419 (Nikolay Degterinsky).
- Fix MySQL interface boolean representation #55427 (Serge Klochkov).
- Fix MySQL text protocol DateTime formatting and LowCardinality(Nullable(T)) types reporting #55479 (Serge Klochkov).
- Make
use_mysql_types_in_show_columns
affect onlySHOW COLUMNS
#55481 (Robert Schulze). - Fix stack symbolizer parsing
DW_FORM_ref_addr
incorrectly and sometimes crashing #55483 (Michael Kolupaev). - Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor #55516 (Kruglov Pavel).
- Fix Query Parameters not working with custom HTTP handlers #55521 (Konstantin Bogdanov).
- Fix checking of non handled data for Values format #55527 (Azat Khuzhin).
- Fix 'Invalid cursor state' in odbc interacting with MS SQL Server #55558 (vdimir).
- Fix max execution time and 'break' overflow mode #55577 (Alexander Gololobov).
- Fix crash in QueryNormalizer with cyclic aliases #55602 (vdimir).
- Disable wrong optimization and add a test #55609 (Alexey Milovidov).
- Merging #52352 #55621 (Alexey Milovidov).
- Add a test to avoid incorrect decimal sorting #55662 (Amos Bird).
- Fix progress bar for s3 and azure Cluster functions with url without globs #55666 (Kruglov Pavel).
- Fix filtering by virtual columns with OR filter in query (resubmit) #55678 (Azat Khuzhin).
- Fixes and improvements for Iceberg storage #55695 (Kruglov Pavel).
- Fix data race in CreatingSetsTransform (v2) #55786 (Azat Khuzhin).
- Throw exception when parsing illegal string as float if precise_float_parsing is true #55861 (李扬).
- Disable predicate pushdown if the CTE contains stateful functions #55871 (Raúl Marín).
- Fix normalize ASTSelectWithUnionQuery, as it was stripping
FORMAT
from the query #55887 (flynn). - Try to fix possible segfault in Native ORC input format #55891 (Kruglov Pavel).
- Fix window functions in case of sparse columns. #55895 (János Benjamin Antal).
- fix: StorageNull supports subcolumns #55912 (FFish).
- Do not write retriable errors for Replicated mutate/merge into error log #55944 (Azat Khuzhin).
- Fix
SHOW DATABASES LIMIT <N>
#55962 (Raúl Marín). - Fix autogenerated Protobuf schema with fields with underscore #55974 (Kruglov Pavel).
- Fix dateTime64ToSnowflake64() with non-default scale #55983 (Robert Schulze).
- Fix output/input of Arrow dictionary column #55989 (Kruglov Pavel).
- Fix fetching schema from schema registry in AvroConfluent #55991 (Kruglov Pavel).
- Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table #55995 (Michael Kolupaev).
- Fix incorrect free space accounting for least_used JBOD policy #56030 (Azat Khuzhin).
- Fix missing scalar issue when evaluating subqueries inside table functions #56057 (Amos Bird).
- Fix wrong query result when http_write_exception_in_output_format=1 #56135 (Kruglov Pavel).
- Fix schema cache for fallback JSON->JSONEachRow with changed settings #56172 (Kruglov Pavel).
- Add error handler to odbc-bridge #56185 (Yakov Olkhovskiy).
ClickHouse release 23.9, 2023-09-28
Backward Incompatible Change
- Remove the
status_info
configuration option and dictionaries status from the default Prometheus handler. #54090 (Alexey Milovidov). - The experimental parts metadata cache is removed from the codebase. #54215 (Alexey Milovidov).
- Disable setting
input_format_json_try_infer_numbers_from_strings
by default, so we don't try to infer numbers from strings in JSON formats by default to avoid possible parsing errors when sample data contains strings that looks like a number. #55099 (Kruglov Pavel).
New Feature
- Improve schema inference from JSON formats: 1) Now it's possible to infer named Tuples from JSON objects without experimental JSON type under a setting
input_format_json_try_infer_named_tuples_from_objects
in JSON formats. Previously without experimental type JSON we could only infer JSON objects as Strings or Maps, now we can infer named Tuple. Resulting Tuple type will conain all keys of objects that were read in data sample during schema inference. It can be useful for reading structured JSON data without sparse objects. The setting is enabled by default. 2) Allow parsing JSON array into a column with type String under settinginput_format_json_read_arrays_as_strings
. It can help reading arrays with values with different types. 3) Allow to use type String for JSON keys with unkown types (null
/[]
/{}
) in sample data under settinginput_format_json_infer_incomplete_types_as_strings
. Now in JSON formats we can read any value into String column and we can avoid getting errorCannot determine type for column 'column_name' by first 25000 rows of data, most likely this column contains only Nulls or empty Arrays/Maps
during schema inference by using type String for unknown types, so the data will be read successfully. #54427 (Kruglov Pavel). - Added IO scheduling support for remote disks. Storage configuration for disk types
s3
,s3_plain
,hdfs
andazure_blob_storage
can now containread_resource
andwrite_resource
elements holding resource names. Scheduling policies for these resources can be configured in a separate server configuration sectionresources
. Queries can be marked using settingworkload
and classified using server configuration sectionworkload_classifiers
to achieve diverse resource scheduling goals. More details in the docs. #47009 (Sergei Trifonov). Added "bandwidth_limit" IO scheduling node type. It allows you to specifymax_speed
andmax_burst
constraints on traffic passing though this node. #54618 (Sergei Trifonov). - Added new type of authentication based on SSH keys. It works only for the native TCP protocol. #41109 (George Gamezardashvili).
- Added a new column
_block_number
for MergeTree tables. #44532. #47532 (SmitaRKulkarni). - Add
IF EMPTY
clause forDROP TABLE
queries. #48915 (Pavel Novitskiy). - SQL functions
toString(datetime, timezone)
andformatDateTime(datetime, format, timezone)
now support non-constant timezone arguments. #53680 (Yarik Briukhovetskyi). - Add support for
ALTER TABLE MODIFY COMMENT
. Note: something similar was added by an external contributor a long time ago, but the feature did not work at all and only confused users. This closes #36377. #51304 (Alexey Milovidov). Note: this command does not propagate between replicas, so the replicas of a table could have different comments. - Added
GCD
a.k.a. "greatest common denominator" as a new data compression codec. The codec computes the GCD of all column values, and then divides each value by the GCD. The GCD codec is a data preparation codec (similar to Delta and DoubleDelta) and cannot be used stand-alone. It works with data integer, decimal and date/time type. A viable use case for the GCD codec are column values that change (increase/decrease) in multiples of the GCD, e.g. 24 - 28 - 16 - 24 - 8 - 24 (assuming GCD = 4). #53149 (Alexander Nam). - Two new type aliases
DECIMAL(P)
(as shortcut forDECIMAL(P, 0)
andDECIMAL
(as shortcut forDECIMAL(10, 0)
) were added. This makes ClickHouse more compatible with MySQL's SQL dialect. #53328 (Val Doroshchuk). - Added a new system log table
backup_log
to track allBACKUP
andRESTORE
operations. #53638 (Victor Krasnov). - Added a format setting
output_format_markdown_escape_special_characters
(default: false). The setting controls whether special characters like!
,#
,$
etc. are escaped (i.e. prefixed by a backslash) in theMarkdown
output format. #53860 (irenjj). - Add function
decodeHTMLComponent
. #54097 (Bharat Nallan). - Added
peak_threads_usage
to query_log table. #54335 (Alexey Gerasimchuck). - Add
SHOW FUNCTIONS
support to clickhouse-client. #54337 (Julia Kartseva). - Added function
toDaysSinceYearZero
with aliasTO_DAYS
(for compatibility with MySQL) which returns the number of days passed since0001-01-01
(in Proleptic Gregorian Calendar). #54479 (Robert Schulze). FunctiontoDaysSinceYearZero
now supports arguments of typeDateTime
andDateTime64
. #54856 (Serge Klochkov). - Added functions
YYYYMMDDtoDate
,YYYYMMDDtoDate32
,YYYYMMDDhhmmssToDateTime
andYYYYMMDDhhmmssToDateTime64
. They convert a date or date with time encoded as integer (e.g. 20230911) into a native date or date with time. As such, they provide the opposite functionality of existing functionsYYYYMMDDToDate
,YYYYMMDDToDateTime
,YYYYMMDDhhmmddToDateTime
,YYYYMMDDhhmmddToDateTime64
. #54509 (Quanfa Fu) (Robert Schulze). - Add several string distance functions, including
byteHammingDistance
,editDistance
. #54935 (flynn). - Allow specifying the expiration date and, optionally, the time for user credentials with
VALID UNTIL datetime
clause. #51261 (Nikolay Degterinsky). - Allow S3-style URLs for table functions
s3
,gcs
,oss
. URL is automatically converted to HTTP. Example:'s3://clickhouse-public-datasets/hits.csv'
is converted to'https://clickhouse-public-datasets.s3.amazonaws.com/hits.csv'
. #54931 (Yarik Briukhovetskyi). - Add new setting
print_pretty_type_names
to print pretty deep nested types like Tuple/Maps/Arrays. #55095 (Kruglov Pavel).
Performance Improvement
- Speed up reading from S3 by enabling prefetches by default. #53709 (Alexey Milovidov).
- Do not implicitly read PK and version columns in lonely parts if unnecessary for queries with FINAL. #53919 (Duc Canh Le).
- Optimize group by constant keys. Will optimize queries with group by
_file/_path
after https://github.com/ClickHouse/ClickHouse/pull/53529. #53549 (Kruglov Pavel). - Improve performance of sorting for
Decimal
columns. Improve performance of insertion intoMergeTree
if ORDER BY contains aDecimal
column. Improve performance of sorting when data is already sorted or almost sorted. #35961 (Maksim Kita). - Improve performance for huge query analysis. Fixes #51224. #51469 (frinkr).
- An optimization to rewrite
COUNT(DISTINCT ...)
and variousuniq
variants tocount
if it is selected from a subquery with GROUP BY. #52082 #52645 (JackyWoo). - Remove manual calls to
mmap/mremap/munmap
and delegate all this work tojemalloc
- and it slightly improves performance. #52792 (Nikita Taranov). - Fixed high in CPU consumption when working with NATS. #54399 (Vasilev Pyotr).
- Since we use separate instructions for executing
toString
with datetime argument, it is possible to improve performance a bit for non-datetime arguments and have some parts of the code cleaner. Follows up #53680. #54443 (Yarik Briukhovetskyi). - Instead of serializing json elements into a
std::stringstream
, this PR try to put the serialization result intoColumnString
direclty. #54613 (lgbo). - Enable ORDER BY optimization for reading data in corresponding order from a MergeTree table in case that the table is behind a view. #54628 (Vitaly Baranov).
- Improve JSON SQL functions by reusing
GeneratorJSONPath
and removing several shared pointers. #54735 (lgbo). - Keeper tries to batch flush requests for better performance. #53049 (Antonio Andelic).
- Now
clickhouse-client
processes files in parallel in case ofINFILE 'glob_expression'
. Closes #54218. #54533 (Max K.). - Allow to use primary key for IN function where primary key column types are different from
IN
function right side column types. Example:SELECT id FROM test_table WHERE id IN (SELECT '5')
. Closes #48936. #54544 (Maksim Kita). - Hash JOIN tries to shrink internal buffers consuming half of maximal available memory (set by
max_bytes_in_join
). #54584 (vdimir). - Respect
max_block_size
for array join to avoid possible OOM. Close #54290. #54664 (