0

I want to perform one hot encoding to one column in my data. The column may looks like this:

    app   
0   a       
1   b      
2   c      
3   a    

I've performed:

pd.get_dummies(df, columns=['app'])
    app_a   app_b   app_c
0     1        0    0
1     0        1    0
2     0        0    1
3     1        0    0

But in reality, the app column can contain 'd' value, in my data to train I don't have it. So what I want is to add app_d after perform get_dummies without 'd' value in my data.

Is there any code can one hot encoding form my simple data above to predefined columns? What I want looks like this:

 app_a  app_b app_c  app_d
     0    1     0   0    0
     1    0     1   0    0
     2    0     0   1    0
     3    1     0   0    0
funie200
  • 3,688
  • 5
  • 21
  • 34
Adiansyah
  • 323
  • 5
  • 11

1 Answers1

4

Try converting your column to pandas.Categorical dtype and specify the categories argument:

df['app'] = pd.Categorical(df['app'], categories=['a', 'b', 'c', 'd'])

pd.get_dummies(df['app'], prefix='app')

[out]

   app_a  app_b  app_c  app_d
0      1      0      0      0
1      0      1      0      0
2      0      0      1      0
3      1      0      0      0

Alternatively you could convert to Categorical type and use the cat.add_categories accessor method to update categories after the fact:

df['app'] = pd.Categorical(df['app'])

df['app'].cat.add_categories(['d'], inplace=True)
Chris Adams
  • 18,389
  • 4
  • 22
  • 39