0

I have a PySpark dataframe as below:

Id variable old_val new_val
a1 frequency 2.0 25.0
a1 latitude 25.762 25.729
a1 longitude -80.192 -80.436
a2 frequency 1.0 5.0
a2 latitude 25.7 25.762
a2 longitude -80.436 -80.192

I am trying to reflect the changes by "id".

I would like to achieve the below ideal state:

Id freq_old_val freq_new_val lat_old_val lat_new_val long_old_val long_new_val
a1 2.0 25.0 25.762 25.729 -80.192 -80.436
a2 1.0 5.0 25.7 25.762 -80.436 -80.192


My useless code with a useful attempt

I am unsure if i must use explode. I am also unsure if agg can be passed with two column values.

import org.apache.spark.sql.functions._
df.groupBy("id").pivot("variable").agg(first("old_val","new_val")) 

I am fairly new to pyspark, working my way through it. Any guidance and help is highly appreciated. Thank you for taking the time to guide.

Ridhi
  • 39
  • 6

1 Answers1

0

I think similar question is already answered here: How to pivot on multiple columns in Spark SQL?

Please comment if it is not clear

Khalid Mammadov
  • 511
  • 4
  • 6