I'm trying to encode categorical values to dummy vectors. pandas.get_dummies does a perfect job, but the dummy vectors depend on the values present in the Dataframe. How to encode a second Dataframe according to the same dummy vectors as the first Dataframe?
import pandas as pd
df=pd.DataFrame({'cat1':['A','N','K','P'],'cat2':['C','S','T','B']})
b=pd.get_dummies(df['cat1'],prefix='cat1').astype('int')
print(b)
cat1_A cat1_K cat1_N cat1_P
0 1 0 0 0
1 0 0 1 0
2 0 1 0 0
3 0 0 0 1
df_test=df=pd.DataFrame({'cat1':['A','N',],'cat2':['T','B']})
c=pd.get_dummies(df['cat1'],prefix='cat1').astype('int')
print(c)
cat1_A cat1_N
0 1 0
1 0 1
How can I get this output ?
cat1_A cat1_K cat1_N cat1_P
0 1 0 0 0
1 0 0 1 0
I was thinking to manually compute uniques for each column and then create a dictionary to map the second Dataframe, but I'm sure there is already a function for that... Thanks!