Lakehouse runtime catalog (BigLake Metastore)
ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your Iceberg tables in Lakehouse runtime catalog aka BigLake Metastore via ClickHouse.
As this feature is beta, you will need to enable it using:
SET allow_database_iceberg = 1;
Prerequisites
Before creating a connection from ClickHouse to Lakehouse runtime catalog (BigLake Metastore), ensure you have:
- A Google Cloud project with Lakehouse runtime catalog enabled
- Application Default credentials (Oauth client ID and client secret) for an application, created via Google Cloud Console
- A refresh token obtained by completing the OAuth flow with the appropriate scopes (e.g.
https://www.googleapis.com/auth/bigqueryand storage scope for GCS) - A warehouse path: a GCS bucket (and optional prefix) where your tables are stored, e.g.
gs://your-bucketorgs://your-bucket/prefix
Creating a connection between Lakehouse runtime catalog and ClickHouse
With the OAuth credentials in place, create a database in ClickHouse that uses the DataLakeCatalog database engine:
Querying Lakehouse runtime catalog tables using ClickHouse
Once the connection is created, you can query tables registered in the Lakehouse runtime catalog.
Example output:
Backticks are required because ClickHouse doesn't support more than one namespace.
To inspect the table definition:
Loading data from Lakehouse into ClickHouse
To load data from a Lakehouse runtime catalog table into a local ClickHouse table for faster repeated queries, create a MergeTree table and insert from the catalog:
After the initial load, query clickhouse_table for lower latency. Re-run the INSERT INTO ... SELECT to refresh data from BigLake when needed.