it seems quick an ease to one-hot-encoding multiple categorical variables at once using get_dummies method, but how to remember which one is which so that my test data have the same feature as my training data? for example:
My training dataset has a CATEGORICAL feature:
X
cat
dog
lion
lion
after get_dummies, I got something like this:
X_1 X_2 X_3
1 0 0
0 1 0
0 0 1
0 0 1
after training model, I am ready to test my awesome magic model and here is the test data:
X
cat
cat
lion
if I apply the pd.get_dummies methods, I will get something like this:
X_1 X_2
1 0
1 0
0 1
which will be inconsistent with my train data features and i simply can't apply my model to the test data.
any suggestions so that I can get some like the following ?
X_1 X_2 X_3
1 0 0
1 0 0
0 0 1
How can I get a fit and transform functionality? again, I have over 50 categorical features and I can't apply LabelEncoder and then One_Hot_Encoder to them one by one.
Any suggestion? thank you.