2

I have a array of struct and I am trying to find count, sum, distinct values of struct column.

create table temp (regionkey smallint, name string, comment string, nations array<struct<n_nationkey:smallint,n_name:string,n_comment:string>>) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '|' 
MAP KEYS TERMINATED BY ',';

When I try to run the query

select name, 
count(nations.n_nationkey) as count, 
sum(nations.n_nationkey) as sum, 
ndv(nations.n_nationkey) as distinct_val 
from temp 
group by name 
order by name;

I get the error

FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but array<smallint> is passed.

What I want to do is find the count, sum and distinct value of n_nationkey.

Any help will be highly appreciated.

Community
  • 1
  • 1
Dereck
  • 95
  • 3
  • 12

1 Answers1

2
select      t.name 
           ,count   (e.col.n_nationkey)             as count 
           ,sum     (e.col.n_nationkey)             as sum
           ,count   (distinct e.col.n_nationkey)    as distinct_val 

from        temp t lateral view explode (t.nations) e

group by    t.name 

order by    t.name
;

For the OP

The same solution with an alias.
nations is not a struct. It is an array of structs.
It does not have an n_nationkey attribute. It has struct elements that have n_nationkey attributes.
The explode function takes an array of structs (nations) and return each struct (nation) in a separate row.

select      t.name 
           ,count   (e.nation.n_nationkey)             as count 
           ,sum     (e.nation.n_nationkey)             as sum
           ,count   (distinct e.nation.n_nationkey)    as distinct_val 

from        temp t lateral view explode (t.nations) e as nation

group by    t.name 

order by    t.name
;
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
  • e.col.n_nation, I understand the e and the column_name, is col a keyword or can we just do e.n_nationkey? Because when I ran the above code i got this error message 'FAILED: SemanticException [Error 10025]: Line 1:20 Expression not in GROUP BY key 'col' – Dereck Jun 22 '17 at 00:24
  • Dereck, check the updated answer. The reason you are struggling here is because you haven't supply a data sample. If you want to understand data issues you should look on data, not code. – David דודו Markovitz Jun 22 '17 at 04:22
  • Dudu, Thank you so much, this has helped me a lot, I just got one confusion, isn't e already an alias for each struct? Thanks again. – Dereck Jun 22 '17 at 21:09
  • `e` is an alias for the table function in the same manner `t` is an alias to a table. – David דודו Markovitz Jun 22 '17 at 22:19