Pyspark Dataframe Convert category row values into columns with aggregate on multiple columns

Question

I have a PySpark dataframe as below:

Id	variable	old_val	new_val
a1	frequency	2.0	25.0
a1	latitude	25.762	25.729
a1	longitude	-80.192	-80.436
a2	frequency	1.0	5.0
a2	latitude	25.7	25.762
a2	longitude	-80.436	-80.192

I am trying to reflect the changes by "id".

I would like to achieve the below ideal state:

Id	freq_old_val	freq_new_val	lat_old_val	lat_new_val	long_old_val	long_new_val
a1	2.0	25.0	25.762	25.729	-80.192	-80.436
a2	1.0	5.0	25.7	25.762	-80.436	-80.192

My useless code with a useful attempt

I am unsure if i must use explode. I am also unsure if agg can be passed with two column values.

import org.apache.spark.sql.functions._
df.groupBy("id").pivot("variable").agg(first("old_val","new_val"))

I am fairly new to pyspark, working my way through it. Any guidance and help is highly appreciated. Thank you for taking the time to guide.

score 0 · Accepted Answer · answered Jul 26 '22 at 17:47

0

I think similar question is already answered here: How to pivot on multiple columns in Spark SQL?

Please comment if it is not clear

answered Jul 26 '22 at 17:47

Khalid Mammadov

1 Answers1