2

I have a table in the following format

ID  Property  Value

1    name      Tim

1    location  USA

1    age       30

2    name      Jack

2    location  UK

2    age       27

And I would like an output in the following format

ID  name  location age

1   Tim    USA     30

2   Jack   UK      27

In python I can do

table_agg = table.groupby('ID')[['Property','Value']].apply(lambda x: dict(x.values))

p = pd.DataFrame(list(table_agg))

How to write the query in Hive?

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
lgao
  • 21
  • 3

1 Answers1

2

You can use collect_list,map functions to group the data then access the array based on key.

Example:

hive> create table t1(id int,property string,valu string) stored as orc;
hive> insert into t1 values(1,"name","Tim"),(1,"location","USA"),(1,"age","30"),(2,"name","Jack"),(2,"location","UK"),(2,"age","27");

hive> select id,
       va[0]["name"]name,
       va[1]["location"]location,
       va[2]["age"]age 
      from (
           select id,collect_list(map(property,value))va 
               from <table_name> group by id
          )t;

Result:

id      name    location        age
1       Tim     USA             30
2       Jack    UK              27
notNull
  • 30,258
  • 4
  • 35
  • 50
  • 1
    Thanks @Shu! I had an error trying this though. The error message is: 'UDFArgumentTypeException Only primitive type arguments are accepted but map was passed as parameter 1. I guess that is because map is not considered as primitive type in Hive? Any thoughts how to get around this? – lgao Jul 12 '19 at 20:28
  • @lgao weird!, I added `my create table statements` and able to run the query without any issues in **Hive 1.1 and 2.1**. if you are having `different datatypes` then add your hive create table statement to the question and error that you are getting! – notNull Jul 12 '19 at 21:06
  • I am also facing same issue FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but map was passed as parameter 1 – vikrant rana Aug 20 '19 at 17:40
  • @dhnsh. I was looking at below problem to solve it in hive but facing an issue while collecting map values like you did. https://stackoverflow.com/questions/57541507/effective-way-to-groupby-without-using-pivot-in-pyspark?noredirect=1#comment101547274_57541507 – vikrant rana Aug 20 '19 at 17:54