1

I have a wide dataframe, with several columns of that describe the type of each row using booleans.

df = pd.DataFrame({'ID': [0, 1, 2, 3], 'Name': ['x', 'y', 'z', 'e'],
                   'Type One': [True, False, False, False], 
                   'Type Two': [False, True, True, True]})

which looks like:

   ID Name  Type One  Type Two
0   0    x      True     False
1   1    y     False      True
2   2    z     False      True
3   3    e     False      True

I would prefer that this dataframe be in a long format, such that each row is matched to a Type:

   ID Name  Type  
0   0    x     Type One
1   1    y     Type Two
2   2    z     Type Two
3   3    e     Type Two

Note: I believe that all ID's have only 1 type, (so where Type One is True, Type N must be False).

ALollz
  • 57,915
  • 7
  • 66
  • 89
Eli
  • 94
  • 11
  • 3
    `df['Type'] = df.filter(like='Type').idxmax(1)` – ALollz Nov 08 '19 at 18:00
  • @ALollz Damn that's a lot more simpler than mine... – r.ook Nov 08 '19 at 18:02
  • @ALollz the original post's column names were chosen for simplicity's sake... so I supposed you'd have to list each column name and pass those as an argument to filter()? – Eli Nov 08 '19 at 18:19
  • 1
    @Eli then in that case just pass a list to select columns by labels instead of using filter: `df[['Type One', 'Type Two']].idxmax(1)` – ALollz Nov 08 '19 at 18:20

2 Answers2

2

This is dot:

s = df[['Type One', 'Type Two']]
s.dot(s.columns)

Output:

0    Type One
1    Type Two
2    Type Two
3    Type Two
dtype: object
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thank you! I see that this is essentially dot product. If my original assumption (`all ID's have only 1 type, (so where Type One is True, Type N must be False)`) isn't true, will there be any way of knowing? Will this dot() still work? – Eli Nov 08 '19 at 18:18
  • `s.sum(axis=1).eq(1).all()` would check that condition for you. Dot still works, but it would return concatenation of all positive column names in that row. – Quang Hoang Nov 08 '19 at 18:21
0
df['Type'] = None
df.loc[df['Type One'], 'Type'] = 'Type One'
df.loc[df['Type Two'], 'Type'] = 'Type Two'
df.drop(columns=['Type One', 'Type Two'], inplace=True)

Output:

   ID Name      Type
0   0    x  Type One
1   1    y  Type Two
2   2    z  Type Two
3   3    e  Type Two
r.ook
  • 13,466
  • 2
  • 22
  • 39