22

I'm new to AWS Athena and trying to pivot some rows into columns, similar to the top answer in this StackOverflow post.

However, when I tried:

SELECT column1, column2, column3
FROM data
PIVOT
(
  MIN(column3)
  FOR column2 IN ('VALUE1','VALUE2','VALUE3','VALUE4')
)

I get the error: mismatched input '(' expecting {',', ')'} (service: amazonathena; status code: 400; error code: invalidrequestexception

Does anyone know how to accomplish what I am trying to achieve in AWS Athena?

Community
  • 1
  • 1
Louis
  • 1,123
  • 5
  • 15
  • 24

3 Answers3

21

Extending @kadrach 's answer. Assuming a table like this

uid | key | value1 | value2
----+-----+--------+--------
 1  |  A  |  10    | 1000
 1  |  B  |  20    | 2000
 2  |  A  |  11    | 1001
 2  |  B  |  21    | 2001

Single column PIVOT works like this

SELECT
  uid,
  kv1['A'] AS A_v1,
  kv1['B'] AS B_v1
FROM (
  SELECT uid, map_agg(key, value1) kv1
  FROM vtable
  GROUP BY uid
) 

Result:

uid | A_v1 | B_v1 
----+------+-------
 1  |  10  |  20   
 2  |  11  |  21  

Multi column PIVOT works like this

SELECT
  uid,
  kv1['A'] AS A_v1,
  kv1['B'] AS B_v1,
  kv2['A'] AS A_v2,
  kv2['B'] AS B_v2
FROM (
  SELECT uid,
      map_agg(key, value1) kv1,
      map_agg(key, value2) kv2
  FROM vtable
  GROUP BY uid
) 

Result:

uid | A_v1 | B_v1 | A_v2 | B_v2 
----+------+------+------+-----
 1  |  10  |  20  | 1000 | 2000
 2  |  11  |  21  | 1001 | 2001

Kristian Sköld
  • 253
  • 3
  • 7
  • 1
    Does it mean I have to know in advance all the unique values in the `key` column? – Dror Feb 17 '20 at 14:14
  • 3
    with this approach unfortunately yes. There might certainly be a more generic solution (using joins for example) but I have yet not had the need to create that. – Kristian Sköld Feb 20 '20 at 21:44
  • 4
    I've tried to find a generic solution, which allows to create the number of columns based on the values dynamically - but presto doesn't seem to offer something like this. My workaround would be to create the corresponding SQL dynamically based on the outcome of an initial query. But that always proved to be too much of an hassle and in our cases we always new the few possible values. – Kristian Sköld Feb 20 '20 at 22:43
  • how are a count column ? – Cristián Vargas Acevedo Jul 06 '22 at 20:05
9

You can do a single-column PIVOT in Athena using map_agg.

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  SELECT uid, map_agg(key, value) kv
  FROM vtable
  GROUP BY uid
) t

Credit goes to this website. Unfortunately I've not found a clever way to do a multi-column pivot this way (I nest the query, which is not pretty).

kadrach
  • 408
  • 6
  • 11
  • This was very helpful. Thank you. I would like to add that, key and value arguments in map_agg() function, would be the existing column which you want to expand and the value will be the value associated with that column. For e.g. key = species, value= count. – Saumya Mehta May 20 '21 at 12:32
  • The url is deprecated now, u can try [this website](https://docs.treasuredata.com/display/public/PD/SQL+Tips+for+Hive+and+Presto#SQLTipsforHiveandPresto-PIVOT) for docs. – Sr Jefers Nov 24 '21 at 22:18
2

I had the same issue with using PIVOT function. However I used a turn around way to obtain a similar format data set :

select 
  columnToGroupOn,
  min(if(colToPivot=VALUE1,column3,null)) as VALUE1, 
  min(if(colToPivot=VALUE2,column3,null)) as VALUE2, 
  min(if(colToPivot=VALUE3,column3,null)) as VALUE3 
from
    data
group by columnToGroupOn           
  • 1
    I think you should be using "max" instead of "min" because in the "if" expression you return 0 for not-matching-rows. – Michael Yakobi Jun 18 '18 at 07:54
  • 1
    Yes , you are right, however OP requires a solution for obtaining the min(column3) and using max will do the opposite. I think the correct solution would be using null instead of 0 as the default value. corrected my answer. – Resham Wadhwa Jun 18 '18 at 11:03