Tobit : Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'x' (simulated data)

Question

I'm conducting a Monte Carlo study. I have a linear model with heteroskedasticity and left censoring of the dependent variable at 0. The mean of censoring rates is 25.9.

I get the error

Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'x'

after trying to estimate a tobit model.

vglm(y[i,]~X[1,i,]+X[2,i,]+X[3,i,]+X[4,i,],family=tobit(Lower=0))

My data are simulated from standard distribution so the problem shoudn't come from odd variables.

I found two other questions that had the same problem with real data : lm() NA/NaN/Inf error , lm() NA/NaN/Inf error But there didn't seem to be any satisfying answers. Besides my data are easily reproducible so it should help identifying the problem

Here are the codes :

library(VGAM)
set.seed(12345)
nobs=100
nsim=100
b=c(2,-2,-3,3)
g=c(1,0.2)
y=matrix(rep(0,nobs*nsim),ncol=nobs,nrow=nsim)
X=array(0,dim=c(4,nsim,nobs))
res=matrix(rep(0,nobs*nsim),ncol=nobs,nrow=nsim)
tobit=vector(mode="list",length=nsim) 
for(i in 1:nsim){
# generate covariates :
X[1,i,]=rlnorm(n=nobs)           
X[2,i,]=runif(n=nobs)<=.75   
X[3,i,]=rnorm(mean = 3,n=nobs)      
X[4,i,]=runif(n=nobs,min=0,max=10)  
res[i,]=(g[1]+g[2]*X[4,i,])*rnorm(n=nobs)   
# generate censored dependent variable
y[i,]=b[1]*X[1,i,]+b[2]*X[2,i,]+b[3]*X[3,i,]+b[4]*X[4,i,]+res[i,]
y[i,]=sapply(y[i,],FUN=function(x){max(0,x)}) #apply censoring
tobit[[i]]<-vglm(y[i,]~X[1,i,]+X[2,i,]+X[3,i,]+X[4,i,],
                              family = tobit(Lower=0)) 
}

Here is the traceback

traceback()
5: lm.fit(X.vlm, y = z.vlm, ...)
4: vlm.wfit(xmat = X.vlm.save, z, Hlist = NULL, U = U, matrix.out =FALSE, 
   is.vlmX = TRUE, qr = qr.arg, xij = NULL)
3: vglm.fitter(x = x, y = y, w = w, offset = offset, Xm2 = Xm2, 
   Ym2 = Ym2, etastart = etastart, mustart = mustart, coefstart =coefstart, 
   family = family, control = control, constraints = constraints, 
   criterion = control$criterion, extra = extra, qr.arg = qr.arg, 
   Terms = mt, function.name = function.name, ...)
2: vglm(y[1, ] ~ X[1, 1, ] + X[2, i, ] + X[3, i, ] + X[4, i, ], 
   family = tobit(Lower = 0))
1: traceback(vglm(y[1, ] ~ X[1, 1, ] + X[2, i, ] + X[3, i, ] + X[4, 
   i, ], family = tobit(Lower = 0)))

*** Edit :

By removing one covariate (I tried with X[3,i,] and X[4,i,]) and setting the lower censoring at -0.001 as BondedDust suggest, It works fine and I even push the number of replications to 1000 without major problems.

By just setting the lower censoring at -0.001, and keeping all the covariates, I get two errors out of 100 iterations. It is worth noting that the error is now

Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'

Besides I get these warnings

In vglm.fitter(x = x, y = y, w = w, offset = offset, Xm2 = Xm2,  ... :
iterations terminated because half-step sizes are very small

`options(error=browser)` shows that this is failing on the first iteration. You should try to get it to work at least once before complicating matters with a loop. — IRTFM, Oct 02 '15 at 17:28

IRTFM · Answer 1 · 2015-10-02T18:56:37.323

I noticed that this was reproducibly failing at i=1 so thought there might be a problem with the vglm call itself. Looking at the examples in ?tobit I added some parameters related to censored distributions, and started to get a few extra iterations. I then tried narrow the range of censoring and got more success with failure only 10% of the time. So I finally added a try() wrapper to let the loop iterate without stopping calculations and got a majority of successful runs with:

for(i in 1:nsim){

X[1,i,]=rlnorm(n=nobs)           
X[2,i,]=runif(n=nobs)<=.75   
X[3,i,]=rnorm(mean = 3,n=nobs)      
X[4,i,]=runif(n=nobs,min=0,max=10)  
res[i,]=(g[1]+g[2]*X[4,i,])*rnorm(n=nobs)   

y[i,]=b[1]*X[1,i,]+b[2]*X[2,i,]+b[3]*X[3,i,]+b[4]*X[4,i,]+res[i,]
y[i,]=pmax(0,y[i,])
tobit[[i]]<-try( vglm(y[i,]~X[1,i,]+X[2,i,]+X[3,i,]+X[4,i,], crit = "coeff", 
                              family = tobit(Lower=-.001, Upper=30, type.f = "cens")) )

}

Notice above that I replace your clunky and perhaps inefficient sapply( ... max) with the equivalent pmax.

> table( sapply(tobit, class))

try-error      vglm 
       12        88

You can loop through the successful returns with:

sapply( tobit[ sapply(tobit, class) == "vglm"],  coefficients)

Top of results:

                    [,1]      [,2]      [,3]       [,4]       [,5]       [,6]
(Intercept):1  2.8460081  1.910137  1.672237  1.2888827  2.4970536  1.0006290
(Intercept):2  0.9183935  1.042424  1.094658  0.9767228  0.9263946  0.9250609
X[1, i, ]      1.7777788  1.880506  1.662835  1.6204394  1.4412304  1.6275208
X[2, i, ]     -3.0847792 -0.453110 -1.152709 -0.9900163 -2.4705355 -0.9651577
X[3, i, ]     -2.4272169 -2.094114 -2.314748 -2.4628501 -1.9001385 -2.1076416
X[4, i, ]      2.6225234  2.245107  2.460182  2.7027493  2.3653673  2.3841989
                    [,7]       [,8]      [,9]      [,10]     [,11]      [,12]
(Intercept):1  0.9520376  1.6319010  1.572563  1.4709517  1.616158  2.4992492
(Intercept):2  0.8698777  0.9005506  1.147485  0.9285724  1.012186  0.9229233
X[1, i, ]      1.6483879  1.6789573  1.718641  1.6544123  1.599116  1.7204001
X[2, i, ]     -0.3718720 -1.8690782 -2.408657 -1.7278915 -1.208939 -2.0037999
X[3, i, ]     -2.2601637 -1.9118288 -2.359274 -1.7828438 -2.257556 -2.3778443
X[4, i, ]      2.5381367  2.3091630  2.583869  2.3582418  2.333988  2.4389336

After getting this modest degree of success I tried setting the Lower back to 0 and got all errors. Increasing the Upper value did not seem to affect success rates in limited testing. I'm unable to explain those findings, but perhaps the package author could be consulted.

Thank you for your answer and your explanations, I'm learning a lot ! Yes I should have mention that by removing some covariates I managed to get more than one iteration. Editing my post right now. — Victor, Oct 03 '15 at 07:24

Tobit : Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'x' (simulated data)

1 Answers1