I am working on a text classification project using the random forest package in R. One issue I am having is that I cannot run the prediction for my supervised learning model because they do not have the same column objects (text variables) between their dataframes, as they have different names. This is the error I get:
"Error in eval(predvars, data, env) : object 'â..' not found"
I believe object â.. in this case is a strange character that is not contained in the testing data. Because of this error, I am trying to fix this by subsetting the testing data by the column names of the training data.
testSparse <- subset(testSparse, select = colnames(trainSparse))
However, when I run this code, I get another error.
"Error in [.data.frame
(x, r, vars, drop = drop) :
undefined columns selected"
Am I close to figuring out the correct way to do this? Is there another way to select all of the columns from the training data, and use it to subset all of the matching columns in the testing data?
Additionally, if applicable, could there be a simpler way to subset all of the matching columns between the two dataframes? They each have around 1000+ columns, so it would be very tricky to do by hand.
Appreciate any help!