0

I would like to delete a subset of the data contained in a clickhouse database rather than truncating the entire database. I am aware of the syntax relying on mutations described in this SO answer or in this SO comment.

For instance to delete the data associated with myid 42 from table mytable:

ALTER TABLE mytable DELETE WHERE myid = 42

This works as expected on a database hosted on a Linux VM (Ubuntu 18.04.5) under Clickhouse 20.11.2.1.

However, this command does not delete the data from a database hosted on a different Linux VM hosted on Azure (also Ubuntu 18.04.5) under Clickhouse 21.2.5.5.

Regardless of whether deletion is successfull or not, clickhouse does not return any error message, simply the Query id followed by Ok.

What could cause this issue? How can it be fixed?

Sheldon
  • 4,084
  • 3
  • 20
  • 41

2 Answers2

1

ALTER TABLE ... DELETE is a mutation. At Clickhouse mutations are asynchronous by default. So you data will be deleted with some background process (and not atomically). Probably you should wait.

Other option is to use mutations_sync setting: https://clickhouse.com/docs/en/operations/settings/settings/#mutations_sync . With that your request hangs until the data will be deleted (or until request_timeout)

Andrei Koch
  • 898
  • 1
  • 7
  • 23
  • Thanks for your reply Andrew. I am not sure whether waiting will fix the issue. It is true that one of the tables that I am working with is large (~8M rows), but I face similar issues with a smaller table (~44K rows) for which I expect the process to run quickly. Or maybe is the processing time not related to the size of the table? – Sheldon Nov 17 '21 at 19:07
  • 1
    For sure it may depends on size of the table. Also it may depends on how much rows are you deleting. Technically, `ALTER DELETE` (in the worst case) is performed by rewriting all the data (except that rows that you want to delete). So , deletion of one row may be very bad. Also, `DELETES` are not recommended for regular use, only in rare cases. – Andrei Koch Nov 17 '21 at 19:26
  • "Technically, ALTER DELETE (in the worst case) is performed by rewriting all the data". Thanks for this helpful comment: I was not aware that `ALTER` was working this way! `mutations_sync` does not seem like a very efficient way to remove a handful of rows either but at least it may allow to monitor when the rewriting is complete. – Sheldon Nov 17 '21 at 19:33
  • You also can watch the progress in `system.mutations` table. `SELECT * from system.mutations where not is_done` will show you all not finished mutations – Andrei Koch Nov 18 '21 at 04:49
0

The answer proposed by Andrei Koch is correct in the sense that the mutation takes some time.

However, my problem was related to the size of the table to alter.

max_table_size_to_drop is a Clickhouse server configuration parameter set to 50Gb. The table that I was trying to alter was larger (56Gb) than this default threshold, thus preventing deletion from working correctly. Changing the server settings fixed this issue.

Sheldon
  • 4,084
  • 3
  • 20
  • 41