0

I want to insert data without duplicates. Triplet of these fields should be unique ticker, kline_type, dateTime

Found out that i should use ReplacingMergeTree https://clickhouse.com/docs/ru/engines/table-engines/mergetree-family/replacingmergetree/

I am trying to

import clickhouse_driver

def prepare_table():
    client = clickhouse_driver.Client.from_url(f'clickhouse://default:{os.getenv("CLICK_PASSWORD")}@localhost:9000/crypto_exchange')
    
    # field names from binance API
    client.execute('''
CREATE TABLE IF NOT EXISTS historical_data_binance
(
    dateTime DateTime,
    closeTime Int64,
    open Float64,
    high Float64,
    low Float64,
    close Float64,
    volume Float64,
    kline_type String,
    ticker String
) ENGINE = ReplacingMergeTree
ORDER BY (ticker, kline_type, dateTime)
''')
    return client

prepare_table()

But I think that my solution didn't work, because I see duplicates:

  2021-11-04 11:00:00 │ 1636027199999 │ 61894.82 │  62188.78 │ 60866.46 │ 61444.74 │ 20.158382 │ 1h         │ BTCUSDT │
│ 2021-11-04 12:00:00 │ 1636030799999 │ 61420.86 │  61698.74 │ 58754.41 │ 61621.01 │ 15.721483 │ 1h         │ BTCUSDT │
└─────────────────────┴───────────────┴──────────┴───────────┴──────────┴──────────┴───────────┴────────────┴─────────┘
┌────────────dateTime─┬─────closeTime─┬─────open─┬─────high─┬──────low─┬────close─┬────volume─┬─kline_type─┬─ticker──┐
│ 2021-11-04 11:00:00 │ 1636027199999 │ 61894.82 │ 62188.78 │ 60866.46 │ 61444.74 │ 20.158382 │ 1h         │ BTCUSDT 

What is the proper way to insert data?

Alex
  • 562
  • 1
  • 6
  • 25
  • https://kb.altinity.com/engines/mergetree-table-engine-family/replacingmergetree/altinity-kb-replacingmergetree-does-not-collapse-duplicates/ – Denny Crane Nov 04 '21 at 12:59
  • https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree/ Thus, ReplacingMergeTree is suitable for clearing out duplicate data in the background in order to save space, but it does not guarantee the absence of duplicates. Plus deduplication is eventual because it's a byproduct of merges – Denny Crane Nov 04 '21 at 13:01
  • https://clickhouse.com/docs/en/sql-reference/statements/select/from/#select-from-final – Denny Crane Nov 04 '21 at 13:02
  • @DennyCrane could you please elaborate your answer. Don't understand how to receive unique values – Alex Nov 04 '21 at 13:08
  • Where is the code that inserts? I suggest adding a `UNIQUE` constraint to your `CREATE TABLE`. – Code-Apprentice Nov 04 '21 at 15:22

1 Answers1

0
  1. ReplacingMergeTree does not guarantee the absence of duplicates. You need to somehow make final deduplication on the fly in selects.

  2. Deduplication is a byproduct of merges.

create table testD ( Key Int64, ver UInt64, Value String) 
Engine=ReplacingMergeTree(ver) order by Key;

insert into testD values (1, 1, '1');
insert into testD values (1, 2, '2');

SELECT * FROM testD
┌─Key─┬─ver─┬─Value─┐
│   1 │   2 │ 2     │
└─────┴─────┴───────┘
┌─Key─┬─ver─┬─Value─┐
│   1 │   1 │ 1     │
└─────┴─────┴───────┘

1)
a) SELECT * FROM testD final
┌─Key─┬─ver─┬─Value─┐
│   1 │   2 │ 2     │
└─────┴─────┴───────┘

b) SELECT key, argMax(Value, ts) FROM testD group by key;
┌─Key─┬─argMax(Value, ver)─┐
│   1 │ 2                  │
└─────┴────────────────────┘

c) SELECT Key, Value FROM testD order by Key, Value desc limit 1 by Key;
┌─Key─┬─Value─┐
│   1 │ 2     │
└─────┴───────┘

2)

optimize table testD final; -- initiate unplanned merge

SELECT * FROM testD;
┌─Key─┬─ver─┬─Value─┐
│   1 │   2 │ 2     │
└─────┴─────┴───────┘
Denny Crane
  • 11,574
  • 2
  • 19
  • 30