0

Here is my code, but I have this error ..

model <- glmrob(x ~ as.factor(i) + as.factor(j), 
  family = poisson(link = "log"), data = myData,method="MT")

error message:

Error in optim(start, sumaConPesos, method = "L-BFGS-B", x = x, y = y,  : non-finite value supplied by optim
enden
  • 163
  • 3
  • 10
  • 1
    Can you please include data and/or code that will provide us with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? – Ben Bolker Dec 21 '16 at 04:57
  • I just add with the data, thanks – enden Dec 21 '16 at 05:30
  • what library are GenIns and glmrob from? They are not base-R, so this isn't reproducible – scoa Dec 21 '16 at 10:22
  • `glmrob`is from `library("robust base")`, and `GenIns` is a data from another library `ChainLadder` – enden Dec 21 '16 at 12:26
  • This is getting closer, but needs to be edited further to be a reproducible example. (1) What are `i` and `j` ? (2) when you call `data()` (as in `myData <- data(GenIns,package="ChainLadder")` what is returned is the *name* of the data object, not a data frame. (3) `GenIns` is a distance matrix. Did you mean `GenInsLong` (just guessing, are `i` and `j` the `accyear` and `devyear` variables?) – Ben Bolker Dec 21 '16 at 18:46
  • I can reproduce your error with `names(GenInsLong)[3] <- "claims"; robustbase::glmrob(claims~factor(accyear)+factor(devyear), family=poisson,method="MT", data=GenInsLong)` ... can you edit your question accordingly? – Ben Bolker Dec 21 '16 at 18:47
  • ok, Sorry :) this the data in form of the GLM fitting, [myData](https://github.com/nmeraihi/notebooks/blob/master/data/genInsGLM.csv) – enden Dec 21 '16 at 18:50
  • those data appear to be identical to `GenInsLong` except that you've subtracted 1 from the indices. – Ben Bolker Dec 21 '16 at 21:25
  • If at all possible, you should [avoid giving an external (off-site) link to your data file](http://meta.stackoverflow.com/questions/306067/what-is-the-recommended-place-to-store-attachments); it's best to try to follow some of the [other suggestions for constructing reproducible examples](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that don't rely on off-site resources. – Ben Bolker Dec 21 '16 at 21:35

1 Answers1

3

I strongly suspect this is an example of Andrew Gelman's folk theorem of statistical computing:

When you have computational problems, often there’s a problem with your model

If I replicate your example 100 times I get errors every time, but different errors different times:

library(robustbase)
data("GenInsLong",package="ChainLadder")
names(GenInsLong)[3] <- "claims"
tt <-
    replicate(100,
        {cat(".")
         try(robustbase::glmrob(claims~factor(accyear)+factor(devyear),
                   family=poisson,method="MT", data=GenInsLong),
           silent=TRUE)
         })
table(tt)

gives 80 instances of your "non-finite value supplied by optim" error and 20 cases of "Error in solve.default(de/n): Lapack routine dgesv: system is exactly singular: U[x,x] = 0" for a variety of values of x.

I've dug down within the function, using debug on various levels. The problem seems to be that the initialization procedure (glmrobMT -> beta0IniCP -> betaExacto) randomly samples (without replacement) a number of rows from the data equal to the number of columns of the model matrix (19) and tries to fit a poisson GLM to it. Because of the structure of your data, these subsamples are almost always multicollinear, leading to NA values in the coefficients, which screws things up down the line.

I don't know how to fix this, but I would ask the following general modeling questions:

  • are you sure it makes sense to fit a model with 19 parameters to a data set with 55 observations? The usual liberal rule of thumb is N/p > 10, this is N/p approx 2 ...
  • are you sure it makes sense to fit a Poisson model, even a robust one, to a response with a mean value of about 2.5 million? The implied coefficient of variation for a Poisson model with mean 2.5 million is sqrt(V)/mean = 1/sqrt(mean) = 0.0006. Even though these are counts, they're so large that I'd suggest a model based on a continuous distribution ...
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Well, when I use glmrob with the Huber Psi, I Have the expected result and every thing works great. Now, it seems that the problem is inly from the choice of the M-estimator, I think I have to investigate more and more... – enden Dec 22 '16 at 01:10
  • Thank you very much for your help, if I have some thing new I'll let you know. – enden Dec 22 '16 at 01:15