0

R is telling me that there are new levels in my string variable, but I don't see that when I print out the levels. "Electronic" is a level in both the test and training data set. The run log is below. I also tried a different subset of the data for the Predict stage and get a different set of levels listed as new. Any thoughts?

> levels(trainingData$Genre)
 [1] "Alternative"        "Christian & Gospel" "Country"           
 [4] "Dance"              "Electronic"         "Hip Hop / Rap"     
 [7] "Hip-Hop"            "Pop"                "Pop in Spanish"    
[10] "R&B / Soul"         "Rap"                "Rock"              
[13] "Soul"               "Soundtrack"        
> levels(testData$Genre)
 [1] "Alternative"        "Christian & Gospel" "Country"           
 [4] "Dance"              "Electronic"         "Hip Hop / Rap"     
 [7] "Hip-Hop"            "Pop"                "Pop in Spanish"    
[10] "R&B / Soul"         "Rap"                "Rock"              
[13] "Soul"               "Soundtrack"        
> testData$Genre[id] <- NA
> # Predict
> rankPred <- predict(lmMod, testData)  
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor Genre has new levels Electronic
Calls: predict -> predict.lm -> model.frame -> model.frame.default
Execution halted
  • What does your model look like? It would be best to provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). What does `table(trainingData$Genre)` look like? Were there Electronic records in there? – MrFlick Mar 31 '17 at 19:30
  • Not sure how I can show my model and the data. I see no place to attach files and the comments only allow a small number of characters. I fiddled with the droplevels() function, but same error. Perhaps I am using it wrong. – Chris Hammond Mar 31 '17 at 19:53
  • Here is my model command lmMod <- lm(Rank ~ Genre + iTunesRank, data=trainingData) – Chris Hammond Mar 31 '17 at 19:54
  • @ChrisHammond, your solution is in your last comment. Electronic occurs zero times in your training data. – vincentmajor Mar 31 '17 at 20:03
  • How is your data split into training/testing? You won't be able to predict anything for records with `Genre == Electronic` and I would advise against predicting for the other three genres with only one record in the training data. – vincentmajor Mar 31 '17 at 20:05
  • `code`> table(testData$Genre) Alternative Christian & Gospel Country Dance 4 1 11 5 Electronic Hip Hop / Rap Hip-Hop Pop 1 4 2 8 Pop in Spanish R&B / Soul Rap Rock 0 4 0 0 Soul Soundtrack 0 0 `code` – Chris Hammond Mar 31 '17 at 20:05
  • Thanks - yes that was the problem. Electronic appears zero times in the training data. – Chris Hammond Mar 31 '17 at 20:12

0 Answers0