Transpose wide dataframe to long dataframe

Question

I have a data frame looks like:

Region, 2000Q1, 2000Q2, 2000Q3, ...
A, 1,2,3,...

I want to transpose this wide table to a long table by 'Region'. So the final product will look like:

Region, Time, Value
A, 2000Q1,1
A, 2000Q2, 2
A, 2000Q3, 3
A, 2000Q4, 4
....

The original table has a very wide array of columns but the aggregation level is always region and remaining columns are set to be tranposed.

Do you know an easy way or function to do this?

what you are trying to do is a melt. You should try something like in [that](https://stackoverflow.com/questions/41670103/how-to-melt-spark-dataframe). — Seb, Aug 20 '20 at 19:10

score 2 · Answer 1 · answered Aug 20 '20 at 20:49

Try with arrays_zip function then explode the array

Example:

df=spark.createDataFrame([('A',1,2,3)],['Region','2000q1','2000q2','2000q3'])

from pyspark.sql.functions import *
from pyspark.sql.types import *

df.withColumn("cc",explode(arrays_zip(array(cols),split(lit(col_name),"\\|")))).\
select("Region","cc.*").\
toDF(*['Region','Value','Time']).\
show()
#+------+-----+------+
#|Region|Value|  Time|
#+------+-----+------+
#|     A|    1|2000q1|
#|     A|    2|2000q2|
#|     A|    3|2000q3|
#+------+-----+------+

score 0 · Accepted Answer · answered Aug 21 '20 at 01:29

Similar but improved for the column calculation.

cols = df.columns
cols.remove('Region')

import pyspark.sql.functions as f

df.withColumn('array', f.explode(f.arrays_zip(f.array(*map(lambda x: f.lit(x), cols)), f.array(*cols), ))) \
  .select('Region', 'array.*') \
  .toDF('Region', 'Time', 'Value') \
  .show(30, False)

+------+------+-----+
|Region|Time  |Value|
+------+------+-----+
|A     |2000Q1|1    |
|A     |2000Q2|2    |
|A     |2000Q3|3    |
|A     |2000Q4|4    |
|A     |2000Q5|5    |
+------+------+-----+

p.s. Don't accept this as an answer :)

Transpose wide dataframe to long dataframe

2 Answers2

Linked