How to divide a column into multiple, based on another column value?

Question

I need split one column into 4, based on another column values in python/pyspark. I tried filtering it based on code and joining the multiple df's. Is there a better way of doing this ?

Code   score ID

AAA     12  ABCD
BBB     14  ABCD
CCC     16  ABCD
DDD     67  ABCD
AAA     89  XYZ
BBB     65  XYZ
CCC     19  XYZ
DDD     56  XYZ


ID          score_AAA   score_BBB   score_CCC   score_DDD

ABCD            12          14      16              67
XYZ             89          65      19              56

How can someone suggest a "better" way, if you do not post your "current" way? Can you please show the code that you have tried? — harvpan, Sep 06 '19 at 18:08
Do you want a pandas answer or a sparksql answer? Please review [ask] — user3483203, Sep 06 '19 at 18:15
Possible duplicate of [How to pivot Spark DataFrame?](https://stackoverflow.com/questions/30244910/how-to-pivot-spark-dataframe) — pault, Sep 06 '19 at 18:30

score 2 · Answer 1 · answered Sep 06 '19 at 18:12

2

Use pivot:

df = df.pivot(index='ID', columns='Code')
df.columns = df.columns.get_level_values(0) + '_' + df.columns.get_level_values(1)

Result:

      score_AAA  score_BBB  score_CCC  score_DDD
ID                                              
ABCD         12         14         16         67
XYZ          89         65         19         56

answered Sep 06 '19 at 18:12

Code Different

90,614
16
144
163

@harvpan : I mentioned using 'filter' and forming multiple dataframes and performing the join.. – rd90080 Sep 06 '19 at 18:52
@rd90080 If this answer helps then mark it as an accepted solution. – Pygirl Jan 22 '21 at 18:03

How to divide a column into multiple, based on another column value?

1 Answers1