pyspark transform column to row

Question

I have pyspark dataframe below,

I need to convert the dataframe in the following manner, I need to pivot row into a separate column and add it to corresponding month.

I have tried the following approach, df.groupBY("Month").pivot("col1"). I am not getting the desired output.

Does this answer your question? [How to melt Spark DataFrame?](https://stackoverflow.com/questions/41670103/how-to-melt-spark-dataframe) — samkart, Mar 15 '23 at 06:20

score 3 · Accepted Answer · answered Mar 14 '23 at 09:36

You can use pandas_api() to enable Pandas API on Spark DataFrame (not to be confused with to_pandas()):

sdf = (sdf.pandas_api()
          .melt('Month', var_name='Col_name', value_name='Value')
          .to_spark())

Output:

>>> sdf.show()
+-----+--------+-----+
|Month|Col_name|Value|
+-----+--------+-----+
|    1|    Col1|   10|
|    1|    Col2|   20|
|    1|    Col3|   30|
|    1|    Col4|   40|
|    2|    Col1|   20|
|    2|    Col2|   30|
|    2|    Col3|   50|
|    2|    Col4|   60|
|    3|    Col1|   30|
|    3|    Col2|   50|
|    3|    Col3|   60|
|    3|    Col4|   70|
|    4|    Col1|   30|
|    4|    Col2|   40|
|    4|    Col3|   50|
|    4|    Col4|   80|
+-----+--------+-----+

Sai Prakash Reddy · Answer 2 · 2023-03-20T11:54:53.383

You can use the below code written in PySpark to transform the columns into rows

from pyspark.sql.functions import expr, explode

df = spark.createDataFrame([(1, 10, 20, 30, 40),
                            (2, 20, 30, 50, 60),
                            (3, 30, 50, 60, 70),
                            (4, 30, 40, 50, 80)], 
                           ["Month", "Col1", "Col2", "Col3", "Col4"])

melted_df = df.selectExpr("Month", 
                          "stack(4, 'Col1', Col1, 'Col2', Col2, 'Col3', Col3, 'Col4', Col4) as (Col_name, Value)")

exploded_df = melted_df.select("Month", "Col_name", explode(expr("map('Value', Value)")))

final_df = exploded_df.selectExpr("Month", "Col_name", "Value as Value_new")

final_df.show()

pyspark transform column to row

2 Answers2