0

I'm trying to create linear mixed model to explain the presence / absence of a species according to 30 fixed environmental variables and 2 random variables ("Location" and "Season"). My data looks like this:

str(glmm_data)

'data.frame':   209 obs. of  40 variables:
 $ CODE                : Factor w/ 209 levels "VAL1_1","VAL1_2",..: 1 72 142 170 176 183 190 197 203 8 ...
 $ Location            : Factor w/ 32 levels "ALMENARA","ARES 1",..: 10 11 12 15 17 2 3 4 21 18 ...
 $ Season              : Factor w/ 7 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ PO4                 : num  -1.301 -1.301 -1.301 0.437 -1.301 ...
 $ NO2                 : num  -1.129 -1.629 -0.781 -1.699 -1.654 ...
 $ NO3                 : num  1.044 0.115 1.918 1.457 1.467 ...
 $ NH4                 : num  0.0123 -0.014 -1.301 -0.2772 -1.301 ...
 $ ChlA                : num  0.341 0.117 0.87 -0.699 1.53 ...
 $ Secchi              : num  29 23 10 17 20 9 22 25 25 24 ...
 $ Temp_w              : num  5.4 3.2 10.3 10.5 4.7 7.2 8 9.2 4.6 6.9 ...
 $ Conductivity        : num  2.74 2.52 2.76 2.36 2.66 ...
 $ Oxi_conc            : num  11.6 9.2 7.04 9.99 7 ...
 $ Hydroperiod         : int  0 0 0 0 1 0 1 0 0 0 ...
 $ Rain                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ RainFre             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Veg_flo             : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Veg_emg             : num  0.735 0.524 0.226 0.685 0.226 ...
 $ Depth_max           : num  1.64 1.57 1.18 1.11 1.85 ...
 $ Agricultural        : num  0 0 0 0 0 ...
 $ LowGrass            : num  0 0.41 0.766 0 0.856 ...
 $ Forest              : num  1.097 1.161 0.44 1.05 0.502 ...
 $ Buildings           : num  0 0 0 0 0 ...
 $ Heterogeneity       : num  0.512 0.437 1.028 0.559 0.98 ...
 $ Morphology          : num  0.04519 -0.00115 0.01556 0.00771 0.12125 ...
 $ Fish                : int  0 0 0 0 0 0 0 0 0 0 ...
 $ TempRange           : num  1.4 1.4 1.4 1.4 1.4 ...
 $ Tavg                : num  1.03 1 1.03 1.03 1 ...
 $ Precipitation       : num  2.8 2.82 2.8 2.81 2.8 ...
 $ MatOrg              : num  0.264 0.257 0.236 0.251 0.313 ...
 $ CO3                 : num  0.14 0.163 0.222 0.335 0.306 ...
 $ PC1                 : num  -0.132 -0.186 -0.074 0.127 -0.175 ...
 $ PC2                 : num  -0.0729 0.0568 -0.0428 -0.0688 -0.0464 ...
 $ PC3                 : num  -0.00638 0.01857 0.02817 -0.00918 0.02056 ...
 $ Alytes_obstetricans : int  0 0 0 0 0 0 1 0 0 0 ...
 $ Bufo_spinosus       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Epidalea_calamita   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Pelobates_cultripes : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Pelodytes_hespericus: int  1 0 0 0 0 0 0 0 0 0 ...
 $ Pelophylax_perezi   : int  0 0 0 0 1 0 1 0 0 0 ...
 $ Pleurodeles_waltl   : int  0 0 0 0 0 0 0 0 0 0 ...

PS: if anyone knows a better way to show my data please explain, I'm a noob at this.

The last 7 columns are the response variables, namely presence (1) or absence (0) of said species so my response variables are binomial. I'm using the glmer function from the lme4 package.

I'm trying to create a model for each species. So the first one looks like this:

Aly_Obs_GLMM <- glmer(Alytes_obstetricans ~ PO4 + NO2 + NO3 + NH4 + ChlA + 
Secchi + Temp_w + Conductivity + Oxi_conc + Hydroperiod + Rain + RainFre + 
Veg_flo + Veg_emg + Depth_max + Agricultural + LowGrass + Forest + Buildings + 
Heterogeneity + Morphology + Fish + TempRange + Tavg + Precipitation + 
MatOrg + CO3 + PC1 + PC2 + PC3 + (1|Location) + (1|Season), family = binomial, 
data = glmm_data
    )

However when running the code, I get the followed error message:

Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GHrule(0L), compDev = compDev, : Downdated VtV is not positive definite

and the model fails to create.

Any ideas on what I may be doing wrong? Thanks

user438383
  • 5,716
  • 8
  • 28
  • 43
Alexennis
  • 1
  • 1
  • See [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It is difficult to reproduce as written. – cazman Sep 13 '21 at 16:30
  • This is a coding question, which is definitely ok for Stack Overflow, but it may end up being more of a statistical question. If so, consider Cross Validated or r-sig-mixed-models. I'm guessing your model is too complex for your data even if you had a continuous response variable. The [limiting sample size](https://hbiostat.org/doc/rms.pdf#section.4.4) for a binary response variable is ~the smaller of either the # of 0 or 1's. Given that, at most you have around 100 pieces of information to estimate all those coefficients. Maybe try a simpler model to see if you can get things working? – aosmith Sep 13 '21 at 16:57
  • provide sample data using `dput` – Nad Pat Sep 13 '21 at 17:10

0 Answers0