I have a sample data as shown below
ProductionDate | CostCenterKey | AEMainCategoryKey | AELossMagnitude |
---|---|---|---|
1/1/2022 | 100030 | OAE | 84.94871412 |
1/1/2022 | 100030 | Overdeclared | -0.010897228 |
1/1/2022 | 100030 | UL | 15.06218311 |
1/1/2022 | 100040 | OAE | 49.99603575 |
1/1/2022 | 100040 | UL | 50.00001425 |
1/1/2022 | 100040 | Undeclared | 0.003950003 |
1/2/2022 | 100030 | OAE | 71.58823183 |
1/2/2022 | 100030 | UL | 28.36946736 |
1/2/2022 | 100030 | Undeclared | 0.042300804 |
1/2/2022 | 100040 | OAE | 49.99702425 |
1/2/2022 | 100040 | UL | 50.00002575 |
1/2/2022 | 100040 | Undeclared | 0.002950002 |
I need to transpose AeMaincategoryKey column and need the output in the below form:
There should be 1 row for the combination of ProductionDate and CostCenterKey
ProductionDate | CostCenterKey | OAE | Overdeclared | UL | Undeclared |
---|---|---|---|---|---|
1/1/2022 | 100030 | 84.94871412 | -0.010897228 | 15.0621831 | 0 |
1/1/2022 | 100040 | 49.99603575 | 0 | 50.0000143 | 0.00395 |
1/2/2022 | 100030 | 71.58823183 | 0 | 28.3694674 | 0.0423008 |
1/2/2022 | 100040 | 49.99702425 | 0 | 50.0000258 | 0.00295 |
I am writing the below code but its not resulting the desired output.
from pyspark.sql import SparkSessionimport pandas as pd
##creating a Spark
Dataframespark_df = sqlContext.sql("select * from hive_metastore.asseteffectiveness.asset_effectiveness_maincat where productiondate in ('2022-01-01','2022-01-02') and costcenterkey in (100030,100040)")
##Converting to Spark Dataframepandas_df = spark_df.toPandas()
pandas_df.pivot_table(index=['ProductionDate','CostCenterKey'], columns=['AEMainCategoryKey'], values='AELossMagnitude', fill_value=0)
display(pandas_df)