Column Manipulations 

A set of queries that allow changing the table structure.

Syntax:

ALTER TABLE [db].name [ON CLUSTER cluster] ADD|DROP|CLEAR|COMMENT|MODIFY COLUMN ...

In the query, specify a list of one or more comma-separated actions.
Each action is an operation on a column.

The following actions are supported:

These actions are described in detail below.

ADD COLUMN 

ADD COLUMN [IF NOT EXISTS] name [type] [default_expr] [codec] [AFTER name_after]

Adds a new column to the table with the specified name, type, codec and default_expr (see the section Default expressions).

If the IF NOT EXISTS clause is included, the query won’t return an error if the column already exists. If you specify AFTER name_after (the name of another column), the column is added after the specified one in the list of table columns. Otherwise, the column is added to the end of the table. Note that there is no way to add a column to the beginning of a table. For a chain of actions, name_after can be the name of a column that is added in one of the previous actions.

Adding a column just changes the table structure, without performing any actions with data. The data doesn’t appear on the disk after ALTER. If the data is missing for a column when reading from the table, it is filled in with default values (by performing the default expression if there is one, or using zeros or empty strings). The column appears on the disk after merging data parts (see MergeTree).

This approach allows us to complete the ALTER query instantly, without increasing the volume of old data.

Example:

ALTER TABLE visits ADD COLUMN browser String AFTER user_id

DROP COLUMN 

DROP COLUMN [IF EXISTS] name

Deletes the column with the name name. If the IF EXISTS clause is specified, the query won’t return an error if the column doesn’t exist.

Deletes data from the file system. Since this deletes entire files, the query is completed almost instantly.

Example:

ALTER TABLE visits DROP COLUMN browser

CLEAR COLUMN 

CLEAR COLUMN [IF EXISTS] name IN PARTITION partition_name

Resets all data in a column for a specified partition. Read more about setting the partition name in the section How to specify the partition expression.

If the IF EXISTS clause is specified, the query won’t return an error if the column doesn’t exist.

Example:

ALTER TABLE visits CLEAR COLUMN browser IN PARTITION tuple()

COMMENT COLUMN 

COMMENT COLUMN [IF EXISTS] name 'comment'

Adds a comment to the column. If the IF EXISTS clause is specified, the query won’t return an error if the column doesn’t exist.

Each column can have one comment. If a comment already exists for the column, a new comment overwrites the previous comment.

Comments are stored in the comment_expression column returned by the DESCRIBE TABLE query.

Example:

ALTER TABLE visits COMMENT COLUMN browser 'The table shows the browser used for accessing the site.'

MODIFY COLUMN 

MODIFY COLUMN [IF EXISTS] name [type] [default_expr] [TTL]

This query changes the name column properties:

  • Type

  • Default expression

  • TTL

    For examples of columns TTL modifying, see [Column TTL](../../engines/table_engines/mergetree_family/mergetree.md#mergetree-column-ttl).
    

If the IF EXISTS clause is specified, the query won’t return an error if the column doesn’t exist.

When changing the type, values are converted as if the toType functions were applied to them. If only the default expression is changed, the query doesn’t do anything complex, and is completed almost instantly.

Example:

ALTER TABLE visits MODIFY COLUMN browser Array(String)

Changing the column type is the only complex action – it changes the contents of files with data. For large tables, this may take a long time.

There are several processing stages:

  • Preparing temporary (new) files with modified data.
  • Renaming old files.
  • Renaming the temporary (new) files to the old names.
  • Deleting the old files.

Only the first stage takes time. If there is a failure at this stage, the data is not changed.
If there is a failure during one of the successive stages, data can be restored manually. The exception is if the old files were deleted from the file system but the data for the new files did not get written to the disk and was lost.

The ALTER query for changing columns is replicated. The instructions are saved in ZooKeeper, then each replica applies them. All ALTER queries are run in the same order. The query waits for the appropriate actions to be completed on the other replicas. However, a query to change columns in a replicated table can be interrupted, and all actions will be performed asynchronously.

Limitations 

The ALTER query lets you create and delete separate elements (columns) in nested data structures, but not whole nested data structures. To add a nested data structure, you can add columns with a name like name.nested_name and the type Array(T). A nested data structure is equivalent to multiple array columns with a name that has the same prefix before the dot.

There is no support for deleting columns in the primary key or the sampling key (columns that are used in the ENGINE expression). Changing the type for columns that are included in the primary key is only possible if this change does not cause the data to be modified (for example, you are allowed to add values to an Enum or to change a type from DateTime to UInt32).

If the ALTER query is not sufficient to make the table changes you need, you can create a new table, copy the data to it using the INSERT SELECT query, then switch the tables using the RENAME query and delete the old table. You can use the clickhouse-copier as an alternative to the INSERT SELECT query.

The ALTER query blocks all reads and writes for the table. In other words, if a long SELECT is running at the time of the ALTER query, the ALTER query will wait for it to complete. At the same time, all new queries to the same table will wait while this ALTER is running.

For tables that don’t store data themselves (such as Merge and Distributed), ALTER just changes the table structure, and does not change the structure of subordinate tables. For example, when running ALTER for a Distributed table, you will also need to run ALTER for the tables on all remote servers.

Rating: 3.1 - 9 votes

Was this content helpful?
★★★☆☆