How to pivot on more than one column for a spark dataframe?

Question

How can we pivot on more than one column in a dataframe. e.g. The example mentioned here, https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-pivot.html

SELECT * FROM person
    PIVOT (
        SUM(age) AS a, AVG(class) AS c
        FOR (name, age) IN (('John', 30) AS c1, ('Mike', 40) AS c2)
    );

Here Pivot is done on (name,age). We can't pass more than one parameter in Spark scala's pivot method as it only expects one column name as parameter. How can we do the similar operation for a dataframe?

mck · Accepted Answer · 2021-01-17T07:35:59.383

4

You can pass an array column to pivot:

val df2 = df.groupBy("id","address")
            .pivot(array("name","age"), 
                   Seq(array(lit("John"),lit("30")).as("c1"), 
                       array(lit("Mike"),lit("40")).as("c2")))
            .agg(sum('age).as("a"), avg('class).as("c"))

df2.show
+---+-------+----+----+----+----+
| id|address|c1_a|c1_c|c2_a|c2_c|
+---+-------+----+----+----+----+
|200|Street2|null|null|null|null|
|100|Street1|30.0| 1.0|null|null|
|400|Street4|null|null|null|null|
|300|Street3|null|null|null|null|
+---+-------+----+----+----+----+

edited Jan 17 '21 at 07:35

answered Jan 17 '21 at 07:28

mck

40,932
13
35
50

Thanks a lot for your answer! I am trying to run a similar query. project1.groupBy("mgr").pivot(array("job","deptno"), Seq(array(lit("MANAGER"),lit("20")).as("c10"), array(lit("CLERK"),lit("10")).as("m20"))).agg(sum(project1.col("sal")).alias("ss"), count("*").alias("c")).show And getting this error "java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.Column array(MANAGER, 20) AS `c10`" Any idea what could be the issue here? – Ashutosh Jan 17 '21 at 17:22
1

@Ashutosh It would be more appropriate to open another question, as there is not enough details and no sample dataframe. please include the necessary details in the new question, thanks – mck Jan 17 '21 at 19:13
Your answer has helped me a lot. Regarding my last comment lit() was not working for me it seems. I tried using withColumn() to concat two columns and passed it in pivot. It worked. project1.withColumn("p", concat($"job", $"deptno")).groupBy("mgr").pivot("p").agg(sum(project1.col("salary")).alias("ss"), count("*").alias("c")).show – Ashutosh Jan 17 '21 at 19:57

How to pivot on more than one column for a spark dataframe?

1 Answers1