0

I have data in the following format:

Record ID Para Tag
1          A    x
1          A    y
2          B    x 
2          B    y
1          A    z

I want to transform the data into the following format:

Record Para x_Tag y_Tag z_Tag
1       A    1     1      1
2       B    1     1      0

Need some guidance to do so in Pandas.

cs95
  • 379,657
  • 97
  • 704
  • 746
kurious
  • 1,024
  • 10
  • 29

2 Answers2

3

This is get_dummies

pd.get_dummies(df.set_index(['RecordID','Para'])).sum(level=[0,1]).reset_index()
Out[132]: 
   RecordID Para  Tag_x  Tag_y  Tag_z
0         1    A      1      1      1
1         2    B      1      1      0
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Can you please elaborate what `sum(level=[0,1])` does? – kurious May 28 '18 at 21:35
  • 1
    @kurious sum base on index , cause we setting those two columns as index , can we want to combine all the row have the same index to one row after get_dummies – BENY May 28 '18 at 21:38
  • @kurious How big is your dataFrame? – cs95 May 28 '18 at 21:45
  • 18720 rows × 3 columns -> before applying the solutions 5776 rows × 449 columns -> after applying the solution – kurious May 28 '18 at 21:52
2

pivot_table

pivot_table works nicely here, and should be fast:

df.pivot_table(
    index=['Record ID', 'Para'], columns='Tag', aggfunc='size', fill_value=0
).add_prefix('Tag_').reset_index()

Tag  Record ID Para  Tag_x  Tag_y  Tag_z
0            1    A      1      1      1
1            2    B      1      1      0

crosstab

pd.crosstab(
    [df['Record ID'], df['Para']], df['Tag']
).add_prefix('Tag_').reset_index()

Tag  Record ID Para  Tag_x  Tag_y  Tag_z
0            1    A      1      1      1
1            2    B      1      1      0
Community
  • 1
  • 1
cs95
  • 379,657
  • 97
  • 704
  • 746