I am struggling with clickhouse to keep unique data row per a PK.
I choose this Column base DB to express statistics data quickly and very satisfied with its speed. However, got some duplicated data issue here.
The test table looks like...
CREATE TABLE test2 (
`uid` String COMMENT 'User ID',
`name` String COMMENT 'name'
) ENGINE ReplacingMergeTree(uid)
ORDER BY uid
PRIMARY KEY uid;
Let's presume that I am going to use this table to join for display names(name
field in this table). However, I can insert many data as I want in same PK(Sorting key).
For Example
INSERT INTO test2
(uid, name) VALUES ('1', 'User1');
INSERT INTO test2
(uid, name) VALUES ('1', 'User2');
INSERT INTO test2
(uid, name) VALUES ('1', 'User3');
SELECT * FROM test2 WHERE uid = '1';
Now, I can see 3 rows with same sorting key. Is there any way to make key unique, at least, prevent insert if the key exists?
Let's think about below scenario
tables and data are
CREATE TABLE blog (
`blog_id` String,
`blog_writer` String
) ENGINE MergeTree
ORDER BY tuple();
CREATE TABLE statistics (
`date` UInt32,
`blog_id` String,
`read_cnt` UInt32,
`like_cnt` UInt32
) ENGINE MergeTree
ORDER BY tuple();
INSERT INTO blog (blog_id, blog_writer) VALUES ('1', 'name1');
INSERT INTO blog (blog_id, blog_writer) VALUES ('2', 'name2');
INSERT INTO statistics(date, blog_id, read_cnt, like_cnt) VALUES (202007, '1', 10, 20);
INSERT INTO statistics(date, blog_id, read_cnt, like_cnt) VALUES (202008, '1', 20, 0);
INSERT INTO statistics(date, blog_id, read_cnt, like_cnt) VALUES (202009, '1', 3, 1);
INSERT INTO statistics(date, blog_id, read_cnt, like_cnt) VALUES (202008, '2', 11, 2);
And here is summing query
SELECT
b.writer,
a.read_sum,
a.like_sum
FROM
(
SELECT
blog_id,
SUM(read_cnt) as read_sum,
SUM(like_cnt) as like_sum
FROM statistics
GROUP BY blog_id
) a JOIN
(
SELECT blog_id, blog_writer as writer FROM blog
) b
ON a.blog_id = b.blog_id;
At this moment it works fine, but if there comes a new low like
INSERT INTO statistics(date, blog_id, read_cnt, like_cnt) VALUES (202008, '1', 60, 0);
What I expected is update low and sum of the "name1"'read_sum
is 73
. but it shows 93
since it allows duplicated insert.
Is there any way to
- prevent duplicated insert
- or set unique guaranteed PK in table
Thanks.