Converting numeric labels back to original strings

Question

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer

data = pd.read_csv('data/TrainingData_unsubscribe.csv')

data['labels'] = data['Category'].factorize()[0]

#vectorize the features
tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', stop_words='english')


x_vectors = tfidf.fit_transform(data.msgContent)
#split the data
x = x_vectors
y = data.labels
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=42)

x_train = x_train.toarray()
x_train.shape
x.shape
x_test = x_test.toarray()
preds = x_vectors.toarray()
#Random seed and callback
stop = tf.keras.callbacks.EarlyStopping(monitor='accuracy', patience=10)
#create the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(len(pd.unique(y)), activation='softmax')
    ])

#compile the model
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              optimizer = tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

#fit the model
model.fit(x_train,y_train, epochs=500, verbose=0, callbacks=[stop])

#elavuation
print('\nEvaluation: ')
model.evaluate(x_test,y_test)

predictions = model.predict(preds)
len(pd.unique(y))

data["Prediction"] = predictions.argmax(axis=1)
output = data.drop(["labels"], axis=1)
category_ids = data[["Category", "labels"]].drop_duplicates()
output['Prediction'] =

Originally converted string labels to numeric ones using factorize():

data['labels'] = data['Category'].factorize()[0]

Now I'm trying to convert the labels back to their initial string variables. I've created a DF with the mapped values

Category	labels
HardBounce	0
SoftBounce	1

etc...

is it possible to map a df column using another df as a reference for the map? I've been unable to find any docs that show how to do this.

While it doesn't answer your specific question, in general one thing that can be helpful is to do all of your transformations inside the sklearn framework. For example, using [labelencoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) instead of `factorize()` would give you access to the `inverse_transform` method to do exactly what you're after — G. Anderson, Aug 20 '21 at 17:36
Rather than storing your mappings in a dataframe, why not use a dictionary and then the solution in [Remap values in pandas column with a dict](https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict) would work. Note, You can alos just use `df.to_dict()` to get your current mappings into dictionary format — G. Anderson, Aug 20 '21 at 17:37

Converting numeric labels back to original strings

0 Answers0