Consider the following example table that I'm trying to make predictions on
As you can see, I have a mix of numerical (Num1 & Num2) and categorical features (Cat1 & Cat2) to predict a value, and I'm using Random Forest Regression to do so
After reading in the file, I'm converting the categorical features into numerical ones using LabelEncoder, like so
category_col =['Cat1', 'Cat2']
labelEncoder = preprocessing.LabelEncoder()
# creating a map of all the numerical values of each categorical labels.
mapping_dict={}
for col in category_col:
df[col] = labelEncoder.fit_transform(df[col])
le_name_mapping = dict(zip(labelEncoder.classes_, labelEncoder.transform(labelEncoder.classes_)))
mapping_dict[col]=le_name_mapping
Once converted, I'm then splitting my dataframe into a training and testing set & making predictions, like so
train_features, test_features, train_labels, test_labels = train_test_split(df, labels, test_size = 0.30)
rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(train_features, train_labels)
predictions = rf.predict(test_features)
My question is, how do I change the numerical numbers of Cat1 & Cat2 to show the original categories again so that I can export the predictions back out, like so
I understand that I need to use labelEncoder.inverse_transform, however, I cant seem to get the syntax right to get back the category text to tie in with the results.
Any help is appreciated!