It is column-oriented and allows to generate analytical reports using SQL queries in real-time.
ClickHouse's performance exceeds comparable column-oriented database management systems that are available on the market. It processes hundreds of millions to over a billion rows and tens of gigabytes of data per server per second.
Detailed comparisonClickHouse uses all available hardware to its full potential to process each query as fast as possible. Peak processing performance for a single query stands at more than 2 terabytes per second (after decompression, only used columns). In distributed setup reads are automatically balanced among healthy replicas to avoid increasing latency.
ClickHouse supports multi-master asynchronous replication and can be deployed across multiple datacenters. All nodes are equal, which allows avoiding having single points of failure. Downtime of a single node or the whole datacenter won't affect the system's availability for both reads and writes.
ClickHouse is simple and works out-of-the-box. It streamlines all your data processing: ingest all your structured data into the system and it becomes instantly available for building reports. SQL dialect allows expressing the desired result without involving any custom non-standard API that could be found in some alternative systems.
ClickHouse DBMS can be configured as a purely distributed system located on independent nodes, without any single points of failure. It also includes a lot of enterprise-grade security features and fail-safe mechanisms against human errors.
ClickHouse processes typical analytical queries two to three orders of magnitude faster than traditional row-oriented systems with the same available I/O throughput and CPU capacity. Columnar storage format allows fitting more hot data in RAM, which leads to shorter typical response times.
Total cost of ownership could be further lowered by using commodity hardware with rotating disk drives instead of enterprise grade NVMe or SSD without significant sacrifices in latency for most kinds of queries.
Vectorized query execution involves relevant SIMD processor instructions and runtime code generation. Processing data in columns increases CPU line cache hit rate.
ClickHouse minimizes the number of seeks for range queries, which increases the efficiency of using rotational disk drives, as it maintains locality of reference for continually stored data.
ClickHouse enables companies to manage their data and create reports without using specialized networks that are aimed at high-performance computing.
ClickHouse features a SQL query dialect with a number of built-in analytics capabilities. In addition to common functions that could be found in most DBMS, ClickHouse comes with a lot of domain-specific functions and features for OLAP scenarios out of the box.
Column-oriented nature of ClickHouse allows having hundreds or thousands of columns per table without slowing down SELECT queries. It's possible to pack even more data in by leveraging wide range data organizing options, such as arrays, tuples and nested data structures.
ClickHouse provides various options for joining tables. Joins could be either cluster local, they can also access data stored in external systems. There's also an external dictionaries support that provides an alternative more simple syntax for accessing data from an outside source.
Users can control the trade-off between result accuracy and query execution time, which is handy when dealing with multiple terabytes or petabytes of data. ClickHouse also provides probabilistic data structures for fast and memory-efficient calculation of cardinalities and quantiles
ClickHouse scales well both vertically and horizontally. ClickHouse is easily adaptable to perform either on a cluster with hundreds or thousands of nodes or on a single server or even on a tiny virtual machine. Currently, there are installations with more multiple trillion rows or hundreds of terabytes of data per single node.
There are many ClickHouse clusters consisting of multiple hundred nodes, including few clusters of Yandex Metrica, while the largest known ClickHouse cluster is well over a thousand nodes.
For analytics over a stream of clean, well structured and immutable events or logs. It is recommended to put each such stream into a single wide fact table with pre-joined dimensions.
System requirements for pre-built packages: Linux, x86_64 with SSE 4.2.
sudo apt-get install apt-transport-https ca-certificates dirmngr sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee \ /etc/apt/sources.list.d/clickhouse.list sudo apt-get update sudo apt-get install -y clickhouse-server clickhouse-client sudo service clickhouse-server start clickhouse-client
sudo yum install yum-utils sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/clickhouse.repo sudo yum install clickhouse-server clickhouse-client sudo /etc/init.d/clickhouse-server start clickhouse-client
export LATEST_VERSION=$(curl -s https://repo.clickhouse.tech/tgz/stable/ | \ grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1) curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh tar -xzvf clickhouse-server-$LATEST_VERSION.tgz sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh sudo /etc/init.d/clickhouse-server start tar -xzvf clickhouse-client-$LATEST_VERSION.tgz sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
For other operating systems the easiest way to get started is using official Docker images of ClickHouse, this is not the only option though. Alternatively, you can easily get a running ClickHouse instance or cluster at Yandex Managed Service for ClickHouse.
After you got connected to your ClickHouse server, you can proceed to:
ClickHouse meetups are essential for strengthening community worldwide, but they couldn't be possible without the help of local organizers. Please, fill this form if you want to become one or want to meet ClickHouse core team for any other reason.
If you have any more thoughts or questions, feel free to contact Yandex ClickHouse team directly at turn on JavaScript to see email address.