1

I have a spark table, df, that serves as an archive of revenue over time, re-processed each day. It has the following schema:

|-- Date: date (nullable = true)
 |-- 2018-09-10: double (nullable = true)
 |-- 2018-09-17: double (nullable = true)
 |-- 2018-09-24: double (nullable = true)
 |-- 2018-09-25: double (nullable = true)
 |-- 2018-09-26: double (nullable = true)
 |-- 2018-09-27: double (nullable = true)
 |-- 2018-09-28: double (nullable = true)
...

There are about 500 columns in the table currently, with a new column being added every day that represents the historical revenue recognized over time.

I want the table to have the following schema:

|-- Revenue Recognized Date: date (nullable = true)
 |-- Processed Date: date (nullable = true)
 |-- Revenue: double (nullable = true)

What might be the best way to transform the first table to have the schema in the second? I have not had much luck with the unpivot() function thus far.

C. Dubz
  • 11
  • 1
  • would help if you could provide a reproducible input data, and desired output. you could refer to https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples . n welcome to SO – murtihash Feb 27 '20 at 01:26
  • What you need is a melt function for pyspark. Have a look at this answer: https://stackoverflow.com/a/41673644/8971668 – ndricca Feb 27 '20 at 09:37

0 Answers0