Is it possible to do one hot encoding with multiple overlapping columns?

Question

I have a dataframe that looks like this:

Person   Length   Sport  
  A       1.80     1
  B       1.85     2
  A       1.80     2

I tried the following code:

pd.get_dummies(data=df, columns=['Sport'])

Which gave me the following output:

Person Length 1   2
 A      1.80  1   0
 B      1.85  0   1
 A      1.80  0   1

I'm trying to get the following output:

Person Length 1   2
 A      1.80  1   1
 B      1.85  0   1

Is there a solution for this?

Use `df = pd.get_dummies(data=df, columns=['Sport']).groupby(df.columns.tolist(), as_index=False).max()`, aggregate `max` by all columns — jezrael, Apr 20 '20 at 07:51

jezrael · Answer 1 · 2020-04-20T08:48:07.817

0

For indicator columns (maximal 1 values) add aggregate max by all original columns with Index.difference of Sport column:

df = (pd.get_dummies(data=df, columns=['Sport'])
        .groupby(df.columns.difference(['Sport']).tolist(), as_index=False)
        .max())
print (df)
   Length Person  Sport_1  Sport_2
0    1.80      A        1        1
1    1.85      B        0        1

edited Apr 20 '20 at 08:48

answered Apr 20 '20 at 07:59

jezrael

822,522
95
1,334
1,252

This is unfortunately not the solution. I'm trying to get just two rows, one for person A and one for B. This instead still gives me three rows – Jan Apr 20 '20 at 08:40
yes, the thing I try to solve is that I want one row for each unique person and his/her length – Jan Apr 20 '20 at 08:47

Is it possible to do one hot encoding with multiple overlapping columns?

1 Answers1