Create a single categorical variable based on many dummy variables

Question

I have several category dummies that are mutually exclusive

id  cat1 cat2 cat3
A    0    0    1
B    1    0    0
C    1    0    0
D    0    0    1
E    0    1    0
F    0    0    1
..

I want to create a new column that contains all categories

id  cat1 cat2 cat3 type
A    0    0    1   cat3
B    1    0    0   cat1
C    1    0    0   cat1
D    0    0    1   cat3
E    0    1    0   cat2
F    0    0    1   cat3
..

score 2 · Accepted Answer · answered Dec 02 '22 at 12:47

2

You can use pandas.from_dummies and filter to select the columns starting with "cat":

df['type'] = pd.from_dummies(df.filter(like='cat'))

Output:

  id  cat1  cat2  cat3  type
0  A     0     0     1  cat3
1  B     1     0     0  cat1
2  C     1     0     0  cat1
3  D     0     0     1  cat3
4  E     0     1     0  cat2
5  F     0     0     1  cat3

answered Dec 02 '22 at 12:47

mozway

194,879
13
39
75

1

This is great - I never used this before, had been using the sklearn [OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html) and variations of [its inverse](https://stackoverflow.com/questions/22548731/how-to-reverse-sklearn-onehotencoder-transform-to-recover-original-data) thanks! – Thomas Kimber Dec 05 '22 at 10:00

jezrael · Answer 2 · 2022-12-02T13:05:39.370

Use DataFrame.dot with DataFrame.filter for column with cat substring, if multiple 1 per rows are separated by ,:

m = df.filter(like='cat').eq(1)
#all columns without first
#m = df.iloc[:, 1:].eq(1)
df['type'] = m.dot(m.columns + ',').str[:-1]
print (df)
  id  cat1  cat2  cat3  type
0  A     0     0     1  cat3
1  B     1     0     0  cat1
2  C     1     0     0  cat1
3  D     0     0     1  cat3
4  E     0     1     0  cat2
5  F     0     0     1  cat3

Create a single categorical variable based on many dummy variables

2 Answers2