0

I am running a dredge of a linear mixed effect model in the MuMin package in R, The model is quite big (see below)

>     Monster <- lmer(Fw.FratioFall ~ Average_mintemp_winter + (Average_mintemp_winter^2)
>                     + percentage_woody_coverage + (percentage_woody_coverage^2)
>                     + kmRoads.km2 + (kmRoads.km2^2) + Fracking 
>                     + WELLS_ACTIVED + (WELLS_ACTIVED^2) + BadlandsCoyote.1000_mi
>                     + (BadlandsCoyote.1000_mi^2) + cougar_presence + COYOTE_springsurveys
>                     + (COYOTE_springsurveys^2) + d3.1 + (d3.1^2) + WT_DEER_springsurveys
>                     + (WT_DEER_springsurveys^2) + WT_DEER_fallsurveys + (WT_DEER_fallsurveys^2) 
>                     + ELK_springsurveys + (ELK_springsurveys^2) + ELK_fallsurveys + (ELK_fallsurveys^2)     
>                     + (BadlandsCoyote.1000_mi*WELLS_ACTIVED) + (BadlandsCoyote.1000_mi*d3.1) 
>                     + (BadlandsCoyote.1000_mi*WELLS_ACTIVED*d3.1) 
>                     + (cougar_presence*percentage_woody_coverage) 
>                     + (COYOTE_springsurveys*WELLS_ACTIVED) 
>                     + (percentage_woody_coverage*cougar_presence*COYOTE_springsurveys)
>                     + (Average_mintemp_winter*COYOTE_springsurveys) 
>                     + (Average_mintemp_winter*BadlandsCoyote.1000_mi)
>                     + year + (1 | YEAR) + (year | StudyArea),  REML = F, data = mydata)

the Dredge function has been running now for 15 hours on an i7 processor and I am wondering if this is normal, what kind of time frame should I expect for a model this size?

I have checked the logs and R hasnt crashed, the dredge is still running (it is producing "singular fit" models in abundance)

singular fit
singular fit
singular fit
singular fit
singular fit
singular fit 

I tested a smaller model and the dredge function took approximately a minute

Smaller model

> sample <- lmer(Fw.FratioFall ~ Average_mintemp_winter +
> (Average_mintemp_winter^2)  + year + (1 | YEAR) + (year | StudyArea), 
> REML = F, data = mydata)

Can anyone advise on time frames for a model dredge with the MuMin package. Thank you.

Kilian Murphy
  • 321
  • 2
  • 14
  • 2
    As far as I understand it the dredge function tries every combination, which in your casae is a huge number. Do not do this. Use your domani knowledge to select your variables rather than rely on some automatic procedure. – user2974951 Dec 17 '18 at 11:02
  • 2
    This is at least N=2^41 models, if I counted the names in the formula right. I would definitely suggest reducing the number of combinations by adding some limiting criteria (_via_ `subset`). If you want to see the progress bar use `dredge(..., trace = 2)`. You can also use parallel computation with `pdredge`. – Kamil Bartoń Dec 17 '18 at 11:09
  • The data set is enormous so this is believe it or not the whittled down version using predictors that make the most sense ecologically. – Kilian Murphy Dec 17 '18 at 11:12
  • Can I use a subset to limit the amount of models based on AIC? – Kilian Murphy Dec 17 '18 at 11:12
  • How long do you foresee it taking to dredge the initial model? – Kilian Murphy Dec 17 '18 at 11:13
  • 1
    @KilianMurphy (1) Information Criteria are calculated from a fitted model's likelihood, so obviously you cannot exclude a model a priori on that basis. (2) Time the fitting of the global model and a null model and multiply the sum by 2^41 / 2 to get a rough estimate. – Kamil Bartoń Dec 17 '18 at 13:38
  • thanks for the help – Kilian Murphy Dec 17 '18 at 14:12
  • What did you end up doing to resolve this? `glmulti` was/is an option. – Kwame Mar 04 '21 at 22:40
  • We instead ran multiple model selection by using a priori knowledge to select different variable combinations and used AIC to select the top ranked models. We then used the MuMin package to average the top 6 models and then we ran a dredge of the averaged model to verify the prior methods. It worked out well! – Kilian Murphy Mar 05 '21 at 09:20

1 Answers1

0

As some commenters said, dredge function increases its resolving time depending on the number of models to test. A recommendation is to select those variables that you think should have the best explanatory power, you should not "go fishing" models, but try to generate those models to test that are best for you.

I´ve tried a model with 19 variables and an estimated resolving time is 79 minutes