R: HUGE memory requirement applying variance structure to gls in nlme package

Question

I created a model using the gls function of the nlme package in R. I then attempted to add fixed variance structure to the model using the weights argument.

However, I get an error about memory allocation that just seems, well, extreme.

Error in glsEstimate(object, control = control) : 'Calloc' could not allocate memory (18446744073709551616 of 8 bytes)

Any suggestions about what to do with this??

Context:

My Code:

mod <- read.csv('mod.ht.dat.csv', head = T)
dim(mod)
[1] 90826     8

library(nlme)
lm3 <- gls(HT ~ D * I(D^2), data = mod, na.action = na.omit, method = 'ML')
vf1Fixed <- varFixed(~D)
lm2 <- update(lm3, . ~ ., weights = vf1Fixed)
Error in glsEstimate(object, control = control) : 
  'Calloc' could not allocate memory (18446744073709551616 of 8 bytes)

-- Note: model format is from Zuur et al. (2009).

My memory usage (using code from here) and memory limit:

> lsos()
                 Type     Size    PrettySize  Rows Columns
lm3               gls 12361512 [1] "11.8 Mb"    16      NA
mod.ht.dat data.frame  4002768  [1] "3.8 Mb" 90826       8
vf1Fixed     varFixed     1024    [1] "1 Kb"     0      NA

> memory.limit()
[1] 8182

Session Info:

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

The necessary allocated memory seems ridiculously high for what I'm doing.

I've modified the code by using gls directly (vs. update), I've placed varFixed in and outside of the model call itself, I've creatd a nw variable for D^2 before the model call, I've cleared my memory, I've restarted my computer, etc....Nothing seems to bring this HUGE number down.

Is it possible that adding this fixed variance structure to the model really is that memory intensive?? Or is perhaps something else going on here that I'm missing?...

UPDATE:

As requested in comments:

>traceback()

8: glsEstimate(object, control = control)
7: Initialize.glsStruct(glsSt, dataMod, glsEstControl)
6: Initialize(glsSt, dataMod, glsEstControl)
5: gls(model = HT ~ D + I(D^2) + D:I(D^2), data = mod, method = "ML", 
   na.action = na.omit, weights = vf1Fixed)
4: eval(expr, envir, enclos)
3: eval(call, parent.frame())
2: update.gls(lm3, . ~ ., weights = vf1Fixed)
1: update(lm3, . ~ ., weights = vf1Fixed)

>dput(head(mod,5))

structure(list(HT = c(3.7, 8.7, 10.1, 4, 8.7), SPEC = structure(c(53L, 
53L, 53L, 53L, 53L), .Label = c("ACBA", "ACER", "ACRU", "AESY", 
"AIAL", "ALJU", "AMAR", "BENI", "CACA", "CACO", "CACR", "CAFL", 
"CAGL", "CAOL", "CAOV", "CAPA", "CARY", "CATO", "CECA", "CELA", 
"CEOC", "CHVI", "COFL", "CRAT", "CRMA", "DIVI", "ELPU", "ELUM", 
"EUAM", "FAGR", "FRAX", "GLTR", "HAVI", "ILAM", "ILDE", "ILOP", 
"JUNI", "JUVI", "LIBE", "LIJA", "LISI", "LIST", "LITU", "LOMA", 
"MAGR", "MATR", "MORU", "NYSY", "OSVI", "OXAR", "PATO", "PIEC", 
"PITA", "PIVI", "PLOC", "PRSE", "QUAL", "QUCO", "QUER", "QUFA", 
"QULY", "QUMA", "QUPH", "QURG", "QURU", "QUST", "QUVE", "RHCO", 
"SAAL", "STGR", "ULAL", "ULAM", "ULRU", "UNKN", "VAAR", "VACC", 
"VACO", "VAST", "VIAC", "VIBR", "VIPR", "VIRA", "VIRU"), class = "factor"), 
    D = c(4.1, 6.9, 7.4, 6.9, 13.7), plot = c(4L, 4L, 4L, 4L, 
    4L), tree_age = c(9L, 13L, 16L, 9L, 13L), Year = c(1933L, 
    1937L, 1940L, 1933L, 1937L), StaticLineID = c(1L, 1L, 1L, 
    2L, 2L), D2 = c(16.81, 47.61, 54.76, 47.61, 187.69)), .Names = c("HT", 
"SPEC", "D", "plot", "tree_age", "Year", "StaticLineID", "D2"
), row.names = c(NA, 5L), class = "data.frame")

Update 2:

Just to note: I tried applying a completely different type of variance structure to my data to see how my computer handled what I would have assumed to be a relatively similarly memory-intensive procedure.

This time I added varIdent variance structure:

>vf2 <- varIdent(form = ~ 1 | SPEC)
>lm22 <- update(lm3, . ~ ., weights = vf2)

Although it took forever to run (and ended up with a convergence error), it did not immediately produce the memory-allocation error as the former varFixed coding did.

Could you also include output of `traceback()` and if possible `dput(head(mod,5))` to glimpse at data structure — Silence Dogood, Oct 18 '16 at 03:52
@Osssan : See update. I've added output from both `traceback` and `dput`. — theforestecologist, Oct 18 '16 at 04:05
Try what happens if you use a third-degree orthogonal polynomial (see the poly function) instead of this strange interaction between a linear and a quadratic effect of the same variable. Your model doesn't seem very sensible. — Roland, Oct 18 '16 at 06:10
@Roland: Even when I remove the "interaction" between linear and quadratic effects, I still get the same error. — theforestecologist, Oct 18 '16 at 18:32
Can you provide a reproducible example? I don't think you'll get an answer without one. — Roland, Oct 18 '16 at 18:44

theforestecologist · Accepted Answer · 2016-10-18T19:33:08.350

2

SOLUTION: Remove 0-values from the variance covariate

Well I'm still not sure why it's happening (though I'm assuming a closer look at varFixed might tell me), but I found the issue.

There were 3 instances in which D = 0.

(More generally, there were 0 values in the variable (the so called variance covariate) I was trying to use to generate my fixed variance structure).

Once I removed these 3 trees with 0-values from my training data, the model ran as expected (and almost immediately).

[Note: these trees all represent data collection errors, so it was ok to "throw them out"].

edited Oct 18 '16 at 19:33

answered Oct 18 '16 at 19:21

theforestecologist

4,667
5
54
91

1

OK, so you had assigned infinite weights. – Roland Oct 19 '16 at 06:02

R: HUGE memory requirement applying variance structure to gls in nlme package

1 Answers1