4

We can use an AggregatedMergeTree table engine, which can be used for a aggregating rows.

Generally in aggregated data we are not interested in storing all unique identifiers and still want to do a count distinct. Still we want to have the ability to do another aggregation to get unique count on these rows afterwards (trough grouping rows in a select query). This is where HyperLogLog comes in handy, which is implemented as the uniqState function in clickhouse.

I would like to store a hyperloglog directly trough an insert query and offer it to a clickhouse table from my client application. Is this possible?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
RoyB
  • 3,104
  • 1
  • 16
  • 37

1 Answers1

3

So I achieved this feat using just a clickhouse query. Its working very well!

CREATE TABLE demo_db.aggregates
(
    name String,
    date Date,
    ids AggregateFunction(uniq, UInt8)
) ENGINE = MergeTree(date, date, 8192)

//So here the declaration of a set of ids in the insert query will lead to a binary hash tree being stored    
INSERT INTO aggregates SELECT
    'Demo',
    toDate('2016-12-03'),
    uniqState(arrayJoin([1, 5, 6, 7])) 

SELECT
    name,
    date,
    uniqMerge(ids) //our hashtree can be grouped and give us unique count over the grouped rows
FROM aggregates
GROUP BY name, date
RoyB
  • 3,104
  • 1
  • 16
  • 37
  • In addition to this, be aware of the AggregatingMergeTree table engine, which is very capable of building up uniq state and adding to this. The AggregatingMergeTree seems to be the best candidate for most usecases where we want to aggregate seperate inserts. – RoyB Jan 10 '18 at 13:59