0

Consider the following example table that I'm trying to make predictions on

enter image description here

As you can see, I have a mix of numerical (Num1 & Num2) and categorical features (Cat1 & Cat2) to predict a value, and I'm using Random Forest Regression to do so

After reading in the file, I'm converting the categorical features into numerical ones using LabelEncoder, like so

category_col =['Cat1', 'Cat2'] 
labelEncoder = preprocessing.LabelEncoder()

# creating a map of all the numerical values of each categorical labels.
mapping_dict={}
for col in category_col:
    df[col] = labelEncoder.fit_transform(df[col])
    le_name_mapping = dict(zip(labelEncoder.classes_, labelEncoder.transform(labelEncoder.classes_)))
    mapping_dict[col]=le_name_mapping

Once converted, I'm then splitting my dataframe into a training and testing set & making predictions, like so

train_features, test_features, train_labels, test_labels = train_test_split(df, labels, test_size = 0.30)

rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(train_features, train_labels)
predictions = rf.predict(test_features)

My question is, how do I change the numerical numbers of Cat1 & Cat2 to show the original categories again so that I can export the predictions back out, like so

enter image description here

I understand that I need to use labelEncoder.inverse_transform, however, I cant seem to get the syntax right to get back the category text to tie in with the results.

Any help is appreciated!

ThatRiddimGuy
  • 381
  • 2
  • 6
  • 19

1 Answers1

3

Quick solution, based on the code you already have:

# Invert the mapping dictionary you created
inv_mapping_dict = {cat: {v: k for k, v in map_dict.items()} for cat, map_dict in mapping_dict.items()}

# Assuming `predictions` is your resulting dataframe.
# Replace the predictions with the inverted mapping dictionary.
predictions.replace(inv_mapping_dict)

For a slightly nicer way of doing it, you can consider the answer here as well when creating your initial mapping dictionary:

Label encoding across multiple columns in scikit-learn

Instead of using the for loop over your category columns to create your mapping dictionary, you can create a dictionary of LabelEncoders over your columns and then apply the fit and inverse of the columns all at once at the beginning and end.

Clarence Leung
  • 2,446
  • 21
  • 24
  • Thanks, I did have a question about that link you posted, the method shown there encodes ALL variables in my data frame. How can I single out the two columns I need and just encode those using the method? – ThatRiddimGuy Apr 19 '19 at 00:40