3

The following is some slightly modified code from the glmRob() examples. When given the newdata argument, predict.glmRob() errors out. Am I doing something wrong?

suppressMessages(library(robust))
data(breslow.dat)
bres.rob <- glmRob(sumY ~ Age10 + Base4 * Trt, family = poisson(), data = breslow.dat)
predict(bres.rob, newdata = breslow.dat)

Error in NextMethod("predict"): no method to invoke

devtools::session_info()
#> ─ Session info ─────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.3 (2020-10-10)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/Los_Angeles         
#>  date     2020-12-14                  
#> 
#> ─ Packages ─────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  callr         3.5.1   2020-10-13 [1] CRAN (R 4.0.2)
#>  cli           2.2.0   2020-11-20 [1] CRAN (R 4.0.3)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  DEoptimR      1.0-8   2016-11-19 [1] CRAN (R 4.0.0)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools      2.3.2   2020-09-18 [1] CRAN (R 4.0.2)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.2)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  fit.models  * 0.64    2020-08-02 [1] CRAN (R 4.0.2)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.0)
#>  knitr         1.30    2020-09-22 [1] CRAN (R 4.0.2)
#>  lattice       0.20-41 2020-04-02 [1] CRAN (R 4.0.3)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.2)
#>  MASS          7.3-53  2020-09-09 [1] CRAN (R 4.0.3)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
#>  mvtnorm       1.1-1   2020-06-09 [1] CRAN (R 4.0.0)
#>  pcaPP         1.9-73  2018-01-14 [1] CRAN (R 4.0.0)
#>  pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.0)
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.0)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
#>  processx      3.4.5   2020-11-30 [1] CRAN (R 4.0.2)
#>  ps            1.5.0   2020-12-05 [1] CRAN (R 4.0.2)
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.2)
#>  remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)
#>  rlang         0.4.9   2020-11-26 [1] CRAN (R 4.0.2)
#>  rmarkdown     2.5     2020-10-21 [1] CRAN (R 4.0.2)
#>  robust      * 0.5-0.0 2020-03-08 [1] CRAN (R 4.0.0)
#>  robustbase    0.93-6  2020-03-23 [1] CRAN (R 4.0.0)
#>  rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.0.2)
#>  rrcov         1.5-5   2020-08-03 [1] CRAN (R 4.0.2)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
#>  testthat      3.0.0   2020-10-31 [1] CRAN (R 4.0.2)
#>  usethis       1.6.3   2020-09-17 [1] CRAN (R 4.0.2)
#>  withr         2.3.0   2020-09-22 [1] CRAN (R 4.0.2)
#>  xfun          0.19    2020-10-30 [1] CRAN (R 4.0.2)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Created on 2020-12-14 by the reprex package (v0.3.0)

Worth noting that the exact same thing does work with regular glm().

suppressMessages(library(robust))
data(breslow.dat)
bres.glm <- glm(sumY ~ Age10 + Base4 * Trt, family = poisson(), data = breslow.dat)
predict(bres.glm, newdata = breslow.dat)
#>        1        2        3        4        5        6        7        8 
#> 2.957756 2.933407 2.704879 3.015431 3.913226 3.250763 2.979112 4.101214 
#>        9       10       11       12       13       14       15       16 
#> 3.360129 2.863352 3.955120 3.257157 2.912460 3.741554 4.459110 3.668917 
#>       17       18       19       20       21       22       23       24 
#> 3.034205 5.093412 3.131601 2.906475 2.930414 2.671553 3.110244 3.174723 
#>       25       26       27       28       29       30       31       32 
#> 3.873096 3.134184 2.644211 3.507451 3.917288 3.375050 2.641300 2.675629 
#>       33       34       35       36       37       38       39       40 
#> 2.592602 2.854897 3.163672 2.890335 2.625822 3.756825 3.201280 2.557211 
#>       41       42       43       44       45       46       47       48 
#> 2.784067 2.988840 3.585320 3.060731 3.448097 2.484164 3.182476 2.577124 
#>       49       50       51       52       53       54       55       56 
#> 5.757692 3.003209 3.274328 3.308657 3.525533 3.268830 2.863768 2.857114 
#>       57       58       59 
#> 2.805090 2.891444 2.892553

Created on 2020-12-14 by the reprex package (v0.3.0)

user20650
  • 24,654
  • 5
  • 56
  • 91
Rory Nolan
  • 972
  • 10
  • 15

1 Answers1

2

Just take out the newdata argument since you're running it on the same data.

From the robust docs here

newdata - optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used.

Edited to add

One of the listed authors replied and recommended using the package "robustbase" (whom he's the maintainer of) as opposed to "robust" because the methods used in the former are more modern, as well as backed by more extensive testing and examples.

Here's my sample code for a quick comparison between the two. Note that the 'r' in robustbase's glmrob is lowercase.

library(robustbase)
library(robust)
data(breslow.dat)

## Comparison of methods.

bres.robustbase <- glmrob(sumY ~ Age10 + Base4 * Trt, family = poisson, data = breslow.dat)
predict(bres.robustbase, newdata = breslow.dat)

bres.robust <- glmRob(sumY ~ Age10 + Base4 * Trt, family = poisson(), data = breslow.dat)
predict(bres.robust)

## predict handling test data on robustbase's glmrob object.

lastIndex = round(nrow(breslow.dat)*0.7)
train = breslow.dat[1:lastIndex,]
test = breslow.dat[(lastIndex + 1):nrow(breslow.dat),]

bres.robustbase <- glmrob(sumY ~ Age10 + Base4 * Trt, family = poisson, data = train)
predict(bres.robustbase, newdata = test)

Notice that you'll get different coefficient estimates, and therefore predictions, between glmrob and glmRob. Deciding which one's more accurate is beyond my knowledge at the moment (hopefully not for you), but I've followed up with our author to see if he can give a high-level explanation that can be posted here.

Econundrums
  • 182
  • 1
  • 10
  • 1
    this is interesting but might not actually solve the OP's problem. (I spent a little while digging around in S3 method dispatch etc. etc. but didn't manage to get very far ...) – Ben Bolker Dec 20 '20 at 21:16
  • 1
    Yes, it looks like the NextMethod is not getting passed through properly & perhaps `family(object)$inverse` should be `family(object)$linkinv`. I think a bug should be raised with `maintainer("robust")` – user20650 Dec 21 '20 at 14:09
  • If OP's question pertains to why can't he pass a dataframe through `newdata` period, I agree (worth reaching out to the package authors). It looks, however, that he was just trying to replicate the example given by runnig `predict` on the *same entire set* of the data, which makes using the `newdata` argument unnecessary. – Econundrums Dec 21 '20 at 19:48
  • 1
    So just to clarify. I want in general to use the `newdata` argument to predict on fresh data. Here I just passed the fitting data into `newdata` to show that the model wasn't even able to predict on its own data (to clarify that whatever issue there was was nothing to do with the format of the data I was passing). – Rory Nolan Dec 21 '20 at 19:51
  • Copy. Thanks for the clarification. I'll reach out to the package authors. – Econundrums Dec 21 '20 at 19:55
  • 1
    @Econundrums ; thanks for following up. Re the coefficients, on my pc the coefficients are unstable under `glmRob` i.e. they change wildly on different runs , and hence can get some wild predictions. – user20650 Dec 22 '20 at 19:31
  • 1
    @user20650 I guess all the more reason to use robustbase then ;) – Econundrums Dec 22 '20 at 21:16