KSQL create table with multi-column aggregation

Question

So essentially I'd like to group by two columns, like this:

CREATE TABLE foo AS
SELECT a, b, SUM(a)
FROM whatever
GROUP BY a, b

whatever is a stream in Kafka format. When I issue the command, ksql returns:

Key format does not support schema.
format: KAFKA
schema: Persistence{columns=[`a` STRING KEY, `b` STRING KEY], features=[]}
reason: The 'KAFKA' format only supports a single field. Got: [`a` STRING KEY, `b` STRING KEY]
Caused by: The 'KAFKA' format only supports a single field. Got: [`DEVICE`
    a KEY, `b` STRING KEY]

The problem is that the Kafka format does not support multi-column keys. Is there a way to workaround this, e. g. creating an artificial key in this table? I did not manage to do this.

I saw someone posted a similiar question and the answer seemed to work. I suggest that's because of the format. https://stackoverflow.com/a/50193239/9632703

The documentation mentions that multi-column aggregations might not work, though also saying that ksql does a background workaround to make it work. Unfortunately ksql only returns the given error message. https://www.confluent.de/blog/ksqldb-0-10-updates-key-columns/#multi-column-aggregations

The funny part is that omitting the first line CREATE TABLE foo AS works. So if some data comes in, the aggregation works. But that's not persistent, of course. If nothing else works, I would also be fine with having a table without a primary key defined, if possible in ksql, since I could still identify the data with {a, b} in my application.

Can someone help me? Thank you.

score 5 · Accepted Answer · answered Feb 26 '21 at 12:15

You can do this if you upgrade to ksqlDB 0.15. This version introduced multi-key support. You'll need to use a KEY_FORMAT that supports it.

ksql> CREATE TABLE FOO AS SELECT A, B, SUM(C) FROM TEST_STREAM GROUP BY A,B;

 Message
-----------------------------------
 Created query with ID CTAS_FOO_53
-----------------------------------
ksql> DESCRIBE FOO;

Name                 : FOO
 Field      | Type
---------------------------------------------
 A          | BIGINT           (primary key)
 B          | VARCHAR(STRING)  (primary key)
 KSQL_COL_0 | DOUBLE
---------------------------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;

ksql> SELECT * FROM FOO EMIT CHANGES LIMIT 5;
+---------------------------+---------------------------+---------------------------+
|A                          |B                          |KSQL_COL_0                 |
+---------------------------+---------------------------+---------------------------+
|220071000                  |AIS                        |0.4                        |
|257838000                  |AIS                        |6.2                        |
|538007854                  |AIS                        |22.700000000000003         |
|257487000                  |AIS                        |2.4                        |
|257601800                  |AIS                        |5.8999999999999995         |
Limit Reached
Query terminated

I will look into it in the next days. Thanks Robin. In the meantime, I've managed to create said table by changing the FORMAT to DELIMITED. — Richard Steinbrecht, Feb 27 '21 at 17:36
@robin i still get the error with version 0.15 my key format is KAFKA. — kailash yogeshwar, Apr 08 '21 at 10:04
@kailashyogeshwar please open a new question with full details & steps to reproduce. — Robin Moffatt, Apr 11 '21 at 17:16

KSQL create table with multi-column aggregation

1 Answers1