0

I have a csv that is formatted similar to the following:

Attr1 Attr2 10/1/22 10/2/22 10/3/22 etc.
Red Square 5 10 12 0
Blue Square 11 8 2 1
Red Circle 1 12 3 4
Blue Circle 3 5 7 6

I can load this into a dataframe, but I want to get it into this format:

Attr1 Attr2 Date Qty
Red Square 10/1/22 5
Red Square 10/2/22 10
Red Square 10/3/22 12
etc. . . .
etc. . . .
Blue Circle 10/1/22 3
Blue Circle 10/2/22 5
Blue Circle 10/3/22 7

Issues:

  1. the number of columns is variable (one per day) increasing each day
  2. want to "explode" the date columns into 1 row per day while keeping the "attribute" columns

This is reformatting issue. No need for any aggregation or calculaiton.

Any ideas how to proceed? Thank you.

  • 2
    Does this answer your question? [How to melt Spark DataFrame?](https://stackoverflow.com/questions/41670103/how-to-melt-spark-dataframe). or if you are using Pyspark 3.2+ (not sure the exact version), you can check pandas API on Pyspark. https://spark.apache.org/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.melt.html. – Emma Nov 08 '22 at 15:42

0 Answers0