1

This is my dataframe :

col_1 col_2 col_3
1 A abc x
2 A abc y
3 A abc z
4 B bcd x
5 B bcd z
6 C cde x
7 D def y

and this is what I want :

col_1 col_2 col_x col_y col_z
1 A abc 1 1 1
4 B bcd 1 0 1
6 C cde 1 0 0
7 D def 0 1 0

I try with df.pivot but I don't known how insert 0 and 1 and how to drop unused rows at once.

DarkBee
  • 16,592
  • 6
  • 46
  • 58
spergynerd
  • 53
  • 1
  • 5

2 Answers2

1

Use crosstab with DataFrame.add_prefix and then DataFrame.clip for only 0, 1 values if duplicates in data. If no duplicates is possible remove clip:

df = (pd.crosstab([df['col_1'], df['col_2']], df['col_3'])
        .add_prefix('col_')
        .clip(upper=1)
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
  col_1 col_2  col_x  col_y  col_z
0     A   abc      1      1      1
1     B   bcd      1      0      1
2     C   cde      1      0      0
3     D   def      0      1      0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

One option is to execute a pd.get_dummies before grouping:

(pd.get_dummies(df, columns=['col_3'], prefix='col')
   .groupby(['col_1', 'col_2'], as_index = False)
   .sum()
)
  col_1 col_2  col_x  col_y  col_z
0     A   abc      1      1      1
1     B   bcd      1      0      1
2     C   cde      1      0      0
3     D   def      0      1      0
sammywemmy
  • 27,093
  • 4
  • 17
  • 31