0

I have a dataframe that looks like this:

Person   Length   Sport  
  A       1.80     1
  B       1.85     2
  A       1.80     2

I tried the following code:

pd.get_dummies(data=df, columns=['Sport'])

Which gave me the following output:

Person Length 1   2
 A      1.80  1   0
 B      1.85  0   1
 A      1.80  0   1

I'm trying to get the following output:

Person Length 1   2
 A      1.80  1   1
 B      1.85  0   1

Is there a solution for this?

Jan
  • 23
  • 4

1 Answers1

0

For indicator columns (maximal 1 values) add aggregate max by all original columns with Index.difference of Sport column:

df = (pd.get_dummies(data=df, columns=['Sport'])
        .groupby(df.columns.difference(['Sport']).tolist(), as_index=False)
        .max())
print (df)
   Length Person  Sport_1  Sport_2
0    1.80      A        1        1
1    1.85      B        0        1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This is unfortunately not the solution. I'm trying to get just two rows, one for person A and one for B. This instead still gives me three rows – Jan Apr 20 '20 at 08:40
  • yes, the thing I try to solve is that I want one row for each unique person and his/her length – Jan Apr 20 '20 at 08:47