I'm trying to create an automated data pre-processing library and I want to transform the string data into numerical so it can be ran through ML algorithms. But I can't seem to reverse it back to its original state, which should be relatively simple given that Sci-Kit has a built in "inverse_transform()" method.
le=LabelEncoder()
def transformCatagorical(data):
catagorical_data = data.select_dtypes(include=['object']).columns.tolist()
for cat in catagorical_data:
transform = le.fit_transform(data[cat].astype(str))
data[cat] = transform
This is our transformation function which yields good results as shown here: Transformed Data
But when we try to reverse it using this function:
def reverse(orig, data):
cols = get_categorical_columns(orig)
for col in cols:
data[col] = le.inverse_transform(data[col])
It transforms it into a complete random, coordinate like structure? Im not sure how to explain it without a picture: Picture of wrongly transformed data
I've been trying to figure out how/why it's doing this but honestly I'm completely lost. Any help would be appreciated! Thank you!