Duplicating pandas.get_dummies columns from train to test data

Asked Aug 16 '17 at 01:01

Active Aug 16 '17 at 01:01

Viewed 2,621 times

I have two dataframes, train and test. They both have the same exact column names which contain categorical string features.

I'm trying to map these features to dummy variables in the training set, train a regression model, then do the same exact mapping for the test set and apply the trained model to it.

The problem I came across is, since test is smaller than train, it happens to not contain all the possible values for some of the categorical features. Since pandas.get_dummies() seems to just look at data.Series.unique() to create new columns, after adding dummy columns in the same way for train and test, test now has less columns.

So how can I instead add dummy columns for train, and then use the same exact column names for test, even if for particular features in test, test.feature.unique() is a subset of train.feature.unique()? I looked at the pd.get_dummies documentation, but I don't think I see anything that'll do what I'm looking for. Any help is greatly appreciated!

asked Aug 16 '17 at 01:01

Austin

6,921
12
73
138

1

This should do it: https://stackoverflow.com/a/37451867/2285236 – ayhan Aug 16 '17 at 01:04
Thanks that looks like it'll work, trying it now :) – Austin Aug 16 '17 at 01:06

Duplicating pandas.get_dummies columns from train to test data

0 Answers0