Subsetting data frame by column names of another data frame?

Question

I am working on a text classification project using the random forest package in R. One issue I am having is that I cannot run the prediction for my supervised learning model because they do not have the same column objects (text variables) between their dataframes, as they have different names. This is the error I get:

"Error in eval(predvars, data, env) : object 'â..' not found"

I believe object â.. in this case is a strange character that is not contained in the testing data. Because of this error, I am trying to fix this by subsetting the testing data by the column names of the training data.

testSparse <- subset(testSparse, select = colnames(trainSparse))

However, when I run this code, I get another error.

"Error in [.data.frame(x, r, vars, drop = drop) : undefined columns selected"

Am I close to figuring out the correct way to do this? Is there another way to select all of the columns from the training data, and use it to subset all of the matching columns in the testing data?

Additionally, if applicable, could there be a simpler way to subset all of the matching columns between the two dataframes? They each have around 1000+ columns, so it would be very tricky to do by hand.

Appreciate any help!

Please refer to guidelines on making a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Desmond, May 31 '22 at 06:32
in base R: `testSparse[, names(testSparse) %in% names(trainSparse)]`. Note that **subset**ting refers to selecting a set of observations (rows) not variables (columns). — , May 31 '22 at 06:37
If at least one selected column from train df is not found in test df then the subset will return an error. This is why you have to select common columns. — Yacine Hajji, May 31 '22 at 07:01

Subsetting data frame by column names of another data frame?

0 Answers0