0

Note: After lots of experimenting with the code, I have completely re-written this question

I'm trying to use user-input values in a 1-row data object to predict the user's category with randomForest, however I get an error indicating NA/Inf values of my data object.

I have a randomForest classifier, which I've trained on a taining dataset and validated on a validation dataset. This was done in my file analysis.R on github and the object is saved as rf.rds, which is read in by server.R).

In server.R I read in the training data which is called x (i.e. x.rds) and then extract only the first row into userdf.

In ui.R I let users enter values which reactively update this object:

  values <- reactiveValues()
  values$df <- userdf
  newEntry <- observe({
      values$df$bron_badges <- input$bron_badges
      values$df$silv_badges <- input$silv_badges
      values$df$gold_badges <- input$gold_badges
      values$df$reputation  <- input$reputation
      values$df$views       <- input$views
      values$df$votes       <- input$votes
  })

This appears to work. I say so because I can run:

output$table <- renderTable({data.frame(values$df)})

and watch the values update beautifully in my UI.

However, when I try to run the following code to run a prediction for the user I get an error message saying that there are NA's:

  output$results <- renderText({
                      {  ds1        <- values$df 
                         x          <- x[,sort(names(x))] 
                         ds1        <- ds1[,sort(names(ds1))] 
                         names(ds1) <- colnames(x)
                         predict(rf, newdata = data.frame(ds1))
                      }
                      })

Even though I "know" the data is not NA from having watched values$df update via ui.R in the line mentioned above and by virtue of the fact that all of the initial values which come from x are not NA. I've also tried it without the data.frame part of the predict statement.

Interestingly, if I replace the predict statement above with table(is.na(ds1)) it tells me that all 1,033 values are NA.

Also interesting, if I replace ds1 with userdf in the predict statement, then everything runs fine (userdf is the non-reactive object).

If I replace the predict statement with setdiff(colnames(x), colnames(ds1)) it does not show any mis-matching column names (it did until the addition of the colnames statements above, due to some weird conversion of _ to . in the reactive dataframe's colnames).

Finally, I find that if I access the names from rf via rf$forest$ncat I get "incorrect number of dimensions" as my error:

  output$results <- renderTable({
                      {  ds1        <- values$df 
                         cn         <- rf$forest$ncat
                         cn         <- cn[,sort(names(cn))] 
                         ds1        <- ds1[,sort(names(ds1))] 
                         names(ds1) <- names(cn)#x #rf$forest$xlevels
                         predict(rf, newdata = data.frame(ds1))
                      }
                      })

However, with the following modification:

  output$results <- renderTable({
                      {  ds1        <- values$df 
                         cn         <- as.data.frame(t(rf$forest$ncat))
                         cn         <- cn[,sort(names(cn))] 
                         ds1        <- ds1[,sort(names(ds1))] 
                         names(ds1) <- names(cn)#x #rf$forest$xlevels
                         predict(rf, newdata = data.frame(ds1))
                      }
                      })

My error goes back to "variables in the training data missing in newdata".

Minimal, reproducible example: https://github.com/hack-r/troubleshooting_predictor_minimal

Here's the full reproducible code and data: https://github.com/hack-r/coursera_shiny

Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • Please create a [minimal, reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Your example hardly seems minimal. This will help isolate the error. Have you got this working outside of Shiny? – MrFlick Jan 12 '15 at 20:38
  • It works fine outside of `shiny`. I'll see if I can make a more minimal version tonight, but this is pretty much the single purpose of this `shiny` app and I'm not sure what's causing the bug, so this is almost as minimal as I can go. – Hack-R Jan 12 '15 at 20:40
  • @JohnPaul It was just a guess really. I tried it with `reactive` instead of observe, but that just led to output that literally printed the `predict` statement. – Hack-R Jan 13 '15 at 05:57
  • If you want to write into val$res should you be using the global assignment <<-? – Pork Chop Jan 13 '15 at 08:43
  • 2
    @Hack-R 1)Once you make something a `reactiveValues` it is already reactive, so you don't to wrap stuff in the`newEntry` or `runmodel` `observers`. 2) Your error could be due to different column names used in `userdata()` and the data you used to get `rf`. What was the code you used to get `rf` originally? – John Paul Jan 13 '15 at 13:50
  • @JohnPaul I have experimented with many different ways of writing this and have completely re-written the question to reflect some things I've found. Could you have another look at it? – Hack-R Jan 14 '15 at 14:20
  • 2
    @MrFlick I have created a whole new, clonable github repository with an absolutely minimal version of this error/app. Please have a look. – Hack-R Jan 14 '15 at 14:52
  • When I replace the `predict` line with just `ds1` and change the render command to `renderTable` it does not produce an error **but** also nothing shows up in the `outputTable` output. It just disappears. Same thing for `class(ds1$votes)`... – Hack-R Jan 14 '15 at 15:05
  • 3
    @Hack-R What is going on with the `colnames` statement? It appears that you first give a `data.frame` to `ds1`, but then overwrite them with a character vector using `ds1<-colnames(x)` Is that what you want or do you just want `ds1` to have the same names as `x`? – John Paul Jan 14 '15 at 16:26
  • 2
    What's with the `ds1 <- colnames(x)` line? That is replaceing the data.frame `ds1` with a character vector. Take that out and the predict should work. The newdata= parameter needs to be a data.frame. – MrFlick Jan 14 '15 at 16:26
  • @JohnPaul @MrFlick aaah!! That was supposed to be `names(ds1) <- colnames(x)` because the column names were mysteriously having their underscores (`_`)replaced with dots (`.`) at some point. I will try that correction right now, thanks for catching it! Update: just fixing that by itself resulted in no output. I also tried wrapping it in `data.frame`. I will correct that part of the question though, thanks. – Hack-R Jan 14 '15 at 16:43
  • I also just tried commenting out the `colnames` lines all together and tried both `renderText` and `renderTable`. – Hack-R Jan 14 '15 at 16:51
  • Ok, 1 more thing try making it `ds1<-isolate(values$df)` and see if that gets you anywhere ( I don't think it will update but it shoudl shown up). – John Paul Jan 14 '15 at 17:25
  • @JohnPaul Interesting... so when I do that it goes back to telling me that variable names in the training data are missing in the newdata -- which was happening before but was solved with the `colnames` statements -- but when I `renderPrint` on `setdiff(colnames(ds1), colnames(x) )` it tells me `character(0)` – Hack-R Jan 14 '15 at 17:38
  • @JohnPaul It said `character(0)` again. When I do the `isolate` with `predict` it gives me the same error about missing variables regardless of if I use `renderTable` or `renderPrint` – Hack-R Jan 14 '15 at 17:52
  • Sorry I meant do `renderTable` on `ds1` not the prediciton to make sure you have the correct data. – John Paul Jan 14 '15 at 17:55
  • correction: That works if there's not an isolate statement but the table disappears if there is an isolate statement. – Hack-R Jan 14 '15 at 18:49
  • @JohnPaul @MrFlick I find that I get "incorrect number of dimensions" as my error if I do `cn <- rf$forest$ncat`, `cn <- cn[,sort(names(cn))]`, then `names(ds1) <- names(cn)`... – Hack-R Jan 21 '15 at 21:03
  • Any resolution to this problem Hack-R? I'm having the exact same issue and thought I'd check before creating a new question. – Liam Flynn Dec 16 '16 at 01:44
  • @LiamFlynn Sorry it's been so long I don't remember but I do think I got it fixed. Have a look at the code on my GitHub repo and you may be able to fix your problem – Hack-R Dec 16 '16 at 02:00

0 Answers0