I have two dataframes, train
and test
. They both have the same exact column names which contain categorical string features.
I'm trying to map these features to dummy variables in the training set, train a regression model, then do the same exact mapping for the test set and apply the trained model to it.
The problem I came across is, since test
is smaller than train
, it happens to not contain all the possible values for some of the categorical features. Since pandas.get_dummies()
seems to just look at data.Series.unique()
to create new columns, after adding dummy columns in the same way for train
and test
, test
now has less columns.
So how can I instead add dummy columns for train
, and then use the same exact column names for test
, even if for particular features in test
, test.feature.unique()
is a subset of train.feature.unique()
? I looked at the pd.get_dummies documentation, but I don't think I see anything that'll do what I'm looking for. Any help is greatly appreciated!