2

I have two dataframes, train and test. They both have the same exact column names which contain categorical string features.

I'm trying to map these features to dummy variables in the training set, train a regression model, then do the same exact mapping for the test set and apply the trained model to it.

The problem I came across is, since test is smaller than train, it happens to not contain all the possible values for some of the categorical features. Since pandas.get_dummies() seems to just look at data.Series.unique() to create new columns, after adding dummy columns in the same way for train and test, test now has less columns.

So how can I instead add dummy columns for train, and then use the same exact column names for test, even if for particular features in test, test.feature.unique() is a subset of train.feature.unique()? I looked at the pd.get_dummies documentation, but I don't think I see anything that'll do what I'm looking for. Any help is greatly appreciated!

Austin
  • 6,921
  • 12
  • 73
  • 138

0 Answers0