I want to merge my predicted results of my test data to my X_test. I was able to merge it with y_test but since my X_test is a corpus I'm not sure how I can identify the indexes to merge. My codes are as below
def lr_model(df):
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd
# Create corpus as a list
corpus = df['text'].tolist()
cv = CountVectorizer()
X = cv.fit_transform(corpus).toarray()
y = df.iloc[:, -1].values
# Splitting to testing and training
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Train Logistic Regression on Training set
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
# Merge true vs predicted labels
true_vs_pred = pd.DataFrame(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))
return true_vs_pred
This gives me the y_test and y_pred but I'm not sure how I can add the X_test as an original data frame (the ids of the X_test) to this. Any guidance is much appreciated. Thanks