-1

I have Pyspark Dataframe named df as below,

enter image description here

I need to pivot the data based on ProducingMonth and classification column and need to produce the following output

enter image description here

I am using the following pyspark code

pivotDF = df.groupBy("WELL_ID","CLASSIFICATION").pivot("CLASSIFICATION")

while I am displaying the data I am getting error "'GroupedData' object has no attribute 'display'"

code_bug
  • 355
  • 1
  • 12
  • You have an error that does not even match your code. Please add the missing piece of code. – Steven May 25 '23 at 09:20

1 Answers1

0

You need to perform the aggregation after.

from pyspark.sql import functions as F

pivotDF = df.groupBy("WELL_ID","producing_month").pivot("CLASSIFICATION").agg(
   F.first("OIL"),
   F.first("GAS"),
)

Then you can probably use display pivotDF.display()

Steven
  • 14,048
  • 6
  • 38
  • 73