Confusion Matrix Levels Error with non unique test data

Question

I am attempting to generate a confusion matrix with predicted data and actual data. I receive an error that the levels are not equal and I receive the error when both variables are read as factors. When I checked the levels I believe the issue is because the test data has many repeated values and thus a lower number of levels than the predicted values which are all unique. Is there a way to force the level of test data such that it will be equal to the predictions?

confusionMatrix(as.factor(sale.pred),as.factor(housing.test.df$SalePrice))

sale.pred are the forecasted values and housing.test.df$SalePrice are the actual values. As stated, sale.pred has no duplicate values and so its level is equal to the number of rows but housing.test.df$SalePrice has duplicate values and so its number of levels is < n as the number of rows.

Check the following previously asked questions https://stackoverflow.com/questions/30002013/error-in-confusion-matrix-the-data-and-reference-factors-must-have-the-same-nu https://stackoverflow.com/questions/24801452/error-in-confusionmatrix-the-data-and-reference-factors-must-have-the-same-numbe — Nareman Darwish, Jan 04 '20 at 20:09
Read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and into the https://github.com/tidyverse/reprex package — Bruno, Jan 04 '20 at 20:34
I tried manually setting the levels between 60,000 and 755000 which seems to be the range sale.pred <- predict(reg, housing.pca) sale.pred <- as.factor(sale.pred) actual <- as.factor(housing.test.df$SalePrice) levels(actual) <- c(60000:755000) levels(sale.pred)<-c(60000:755000) confusionMatrix(sale.pred,actual) but received the error Error in table(data, reference, dnn = dnn, ...) : attempt to make a table with >= 2^31 elements — B Bow, Jan 04 '20 at 20:43

Confusion Matrix Levels Error with non unique test data

0 Answers0