Conditional Group By on Selected rows in KDB/Q-Sql

Question

I have a requirement where I have to execute multiple queries and perform group by on a column with where clause , group by column is fixed and where condition will be perform on fixed column with variable criteria . Only Column name and aggregation type will be varies For example if I have table :

k1  k2  val1    val2
1   1   10      30
1   1   20      31
1   2   30      32
2   2   40      33
2   3   50      34
2   4   60      35
2   4   70      36
3   4   80      37
3   5   90      38
3   5   100     39


t:([] k1:1 1 1 2 2 2 2 3 3 3; k2:1 1 2 2 3 4 4 4 5 5; val1:10 20 30 40 50 60 70 80 90 100; val2:31 31 32 33 34 35 36 37 38 39)

Queries which I need to perform will be like

 select avg_val1:avg val1 by k1 from t where k2 in 2 3 4
 select sum_val1:sum val1 by k1 from t where k2 in 2 3 
 select sum_val2:sum val2 by k1 from t where k2 in 2 3 5
 select min_val2:min val2 by k1 from t where k2 in 1 2 3 4 5

I want to execute these queries in a single execution using functional queries. I tried this, but not able to put right condition and syntax

res:?[t;();(enlist`k1)!enlist`k1;(`avg_val1;`sum_val2)!({$[x; y; (::)]}[1b;(avg;`val1)];{$[x; y; (::)]}[1b; (sum;`val2)])];

k1  avg_val1 sum_val2
1   20.0      94
2   55.0      138
3   90.0      114

Instead putting 1b in condition , i want to put real condition like this:

res:?[t;();(enlist`k1)!enlist`k1;(`avg_val1;`sum_val2)!({$[x; y; (::)]}[(in;`k2;2 3 4i);(avg;`val1)];{$[x; y; (::)]}[(in;`k2;2 3i); (sum;`val2)])];

But it will give "type" error, since query will be first group by k1 ,and k2 will be list. So condition is also not right.

I want to know what can be the best solution for this. May be there can be better approach to solve the same . Please help me to in same.

Thank you.

Jorge Sawyer · Accepted Answer · 2020-06-26T13:46:35.027

The vector conditional (?) operator can get you closer to what you'd like.

Given your table

t:([] k1:1 1 1 2 2 2 2 3 3 3; k2:1 1 2 2 3 4 4 4 5 5; val1:10 20 30 40 50 60 70 80 90 100; val2:31 31 32 33 34 35 36 37 38 39)
k1 k2 val1 val2
---------------
1  1  10   31  
1  1  20   31  
1  2  30   32  
2  2  40   33  
2  3  50   34  
2  4  60   35  
2  4  70   36  
3  4  80   37  
3  5  90   38  
3  5  100  39

you can update, say, the val1 column to hold null values wherever a condition does not hold

update val1:?[k2 in 2 3 4;val1;0N] from t
k1 k2 val1 val2
---------------
1  1       31  
1  1       31  
1  2  30   32  
2  2  40   33  
2  3  50   34  
2  4  60   35  
2  4  70   36  
3  4  80   37  
3  5       38  
3  5       39

and with a little more work you can get the desired aggregate (NB: the aggregate functions ignore null values)

select avg ?[k2 in 2 3 4;val1;0N] by k1 from t
k1| x 
--| --
1 | 30
2 | 55
3 | 80

You can wrap this up into a functional select statement like so

?[t;();{x!x}enlist`k1;`avg_val1`sum_val2!((avg;(?;(in;`k2;2 3 4);`val1;0N));(sum;(?;(in;`k2;2 3);`val2;0N)))]
k1| avg_val1 sum_val2
--| -----------------
1 | 30       32      
2 | 55       67      
3 | 80       0

However, this can break when you use an function that does not ignore nulls, e.g. count. You may be better off using the where operator in you select statement:

select avg val1 where k2 in 2 3 4 by k1 from t
k1| x 
--| --
1 | 30
2 | 55
3 | 80

?[t;();{x!x}enlist`k1;`avg_val1`sum_val2!((avg;(`val1;(where;(in;`k2;2 3 4))));(sum;(`val2;(where;(in;`k2;2 3)))))]
k1| avg_val1 sum_val2
--| -----------------
1 | 30       32      
2 | 55       67      
3 | 80       0

Thanks @Jorge for response. Updating column will impact the result, for example if we execute: "select avg_val1:avg val1 by k1 from t where k2 in 2 3 4" it will result : 1 30.0 2 55.0 3 80.0 While with your solution , it will be k1| avg_val1 --| ---------- 1 | 10 2 | 55 3 | 26.666 It will impact aggregations where no of rows is important, like count, avg etc, We need to select only those rows which are in k2 without modifying table. Since table is very large and modifying the same will defeat the purpose. — sam, Jun 26 '20 at 13:15
Ah, of course. Apologies for that. I've updated the answer to replace with null values rather than zeroes. kdb+ aggregates such as avg, min, max, and sum ignore nulls. Count will still be a problem but you can get around that by applying sum to not null, or summing signum of a val column rather than counting it. — Jorge Sawyer, Jun 26 '20 at 13:37
@sam I've also added a solution that uses the `where` operator in the aggregation. This is a more robust solution which will be able to handle other function such as `count`. — Jorge Sawyer, Jun 26 '20 at 13:47

Conditional Group By on Selected rows in KDB/Q-Sql

1 Answers1