0

I have a table like below:

DocumentId Words Weight

1   alpha   2.5
1   beta    4.7
1   gamma   3
2   beta    8
2   gamma   2
3   alpha   5
4   apha    2
4   gamma   6

I want it to convert it to

DocumentId  alpha   beta   gamma
       1       2.5  4.7    3
       2       0    8      2
       3       5    0      0
       4       2    0      6

The issue is I have around 60,000 thousands unique words and 7 millions documents.

Is there a efficient way to convert this?

Sahil Dahiya
  • 721
  • 1
  • 5
  • 12

1 Answers1

0

Just had to do this myself. The proper terminology is converting from a longitudinal format to a wide format. You'll want to use df.pivot(). Based off the columns you want to reduce and the new column headers you want, your code will be

df.pivot(index = 'DocumentId', columns = 'Words')
df.fillna(0)
m13op22
  • 2,168
  • 2
  • 16
  • 35