Reversing Sci-Kit LabelEncoder, but have a 2D array dataset

Question

I'm trying to create an automated data pre-processing library and I want to transform the string data into numerical so it can be ran through ML algorithms. But I can't seem to reverse it back to its original state, which should be relatively simple given that Sci-Kit has a built in "inverse_transform()" method.

le=LabelEncoder()

def transformCatagorical(data):
    catagorical_data = data.select_dtypes(include=['object']).columns.tolist()

    for cat in catagorical_data:
        transform = le.fit_transform(data[cat].astype(str))
        data[cat] = transform

This is our transformation function which yields good results as shown here: Transformed Data

But when we try to reverse it using this function:

def reverse(orig, data):
    cols = get_categorical_columns(orig)
    for col in cols:
        data[col] = le.inverse_transform(data[col])

It transforms it into a complete random, coordinate like structure? Im not sure how to explain it without a picture: Picture of wrongly transformed data

I've been trying to figure out how/why it's doing this but honestly I'm completely lost. Any help would be appreciated! Thank you!

Does this answer your question? [How to pass test data for classification model if independent variables are categorical in python?](https://stackoverflow.com/questions/72574827/how-to-pass-test-data-for-classification-model-if-independent-variables-are-cate) — Ben Reiniger, Jul 07 '22 at 21:50
(the question asked in the proposed duplicate is different, but the answer is the same...) — Ben Reiniger, Jul 08 '22 at 00:32

Reversing Sci-Kit LabelEncoder, but have a 2D array dataset

0 Answers0