-2

I need split one column into 4, based on another column values in python/pyspark. I tried filtering it based on code and joining the multiple df's. Is there a better way of doing this ?

Code   score ID

AAA     12  ABCD
BBB     14  ABCD
CCC     16  ABCD
DDD     67  ABCD
AAA     89  XYZ
BBB     65  XYZ
CCC     19  XYZ
DDD     56  XYZ


ID          score_AAA   score_BBB   score_CCC   score_DDD

ABCD            12          14      16              67
XYZ             89          65      19              56
rd90080
  • 53
  • 5
  • 1
    How can someone suggest a "better" way, if you do not post your "current" way? Can you please show the code that you have tried? – harvpan Sep 06 '19 at 18:08
  • 1
    Do you want a pandas answer or a sparksql answer? Please review [ask] – user3483203 Sep 06 '19 at 18:15
  • Possible duplicate of [How to pivot Spark DataFrame?](https://stackoverflow.com/questions/30244910/how-to-pivot-spark-dataframe) – pault Sep 06 '19 at 18:30

1 Answers1

2

Use pivot:

df = df.pivot(index='ID', columns='Code')
df.columns = df.columns.get_level_values(0) + '_' + df.columns.get_level_values(1)

Result:

      score_AAA  score_BBB  score_CCC  score_DDD
ID                                              
ABCD         12         14         16         67
XYZ          89         65         19         56
Code Different
  • 90,614
  • 16
  • 144
  • 163