1

I'd like to have a table, which only contains the top n rows per group. Consider this table:

Field     | Type                           
--------------------------------------------
 a        | VARCHAR(STRING)  (primary key) 
 b        | VARCHAR(STRING)  (primary key) 
 c        | VARCHAR(STRING)  (primary key) 
 d        | INTEGER                        

For every group (denoted by the primary key), I need e. g. the 10 rows with the highest value in column d. d is an aggregation over {a, b, c}, which sums up column c. This works pretty easy in normal SQL with ROW_NUMBER(), as described here: How do I use ROW_NUMBER()? , where you simply assign a number to every row, which depicts the row's placement in descending order depending on the value of column d.

Unfortunately, ksql doesn't support subqueries yet, which you need for ROW_NUMBER(). https://github.com/confluentinc/ksql/issues/745 I'm also not quite sure whether ksql supports ROW_NUMBER(), I think not, since I haven't found anything in the documentation and didn't manage to run it by myself.

I also found the TOPK function in ksql, but that doesn't seem to work as expected. https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/aggregate-functions/#topk

There is an issue about it on GitHub https://github.com/confluentinc/ksql/issues/403 . When I run it by myself, I obtain a column with an array with the top n values for {a, b}, which I can't map to the according values of column c. So that doesn't work as well. Example of what TOPK yields for k = 5:

a     | b     | d                         
--------------------------------------------
val1  | val2  | [10, 8, 6, 4, 2]
val1  | val4  | [7, 3, 3, 1, 0]
val1  | val6  | [5, 4, 3, 2, 1]             

Here is an example of what I actually need, assuming that {10, 8, 6, 4, 2} are the 5 biggest values of d. For {a, b}, I want the values of c with the 5 biggest values of d.

a     | b     | c     | d                         
--------------------------------------------
val1  | val2  | val3  | 10
val1  | val2  | val8  | 8
val1  | val2  | val9  | 6
val1  | val2  | val10 | 4
val1  | val2  | val11 | 2             

Now, is there any possibility to do a top-n query in ksql? Or is it on the roadmap for future releases? Thanks.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

0 Answers0