0

I run the following model in R:

 clmm_br<-clmm(Grado_amenaza~Life_Form  + size_max_cm + 
              leaf_length_mean  + petals_length_mean + 
              silicua_length_mean + bloom_length + categ_color+ (1|Genero) , 
      data=brasic1)

I didn't get any warnings or errors but when I run summary(clmm_br) I can't get the p-values:

  summary(clmm_br)

 Cumulative Link Mixed Model fitted with the Laplace approximation

 formula: Grado_amenaza ~ Life_Form + size_max_cm + leaf_length_mean +  
    petals_length_mean + silicua_length_mean + bloom_length +  
    categ_color + (1 | Genero)
 data:    brasic1

 link  threshold nobs logLik AIC    niter      max.grad cond.H
 logit flexible  76   -64.18 160.36 1807(1458) 1.50e-03 NaN   

 Random effects:

  Groups Name        Variance       Std.Dev. 
  Genero (Intercept) 0.000000008505 0.00009222

 Number of groups:  Genero 39 

 Coefficients:
                      Estimate Std. Error z value Pr(>|z|)
Life_Form[T.G]        2.233338         NA      NA       NA
Life_Form[T.Hem]      0.577112         NA      NA       NA
Life_Form[T.Hyd]    -22.632916         NA      NA       NA
Life_Form[T.Th]      -1.227512         NA      NA       NA
size_max_cm           0.006442         NA      NA       NA
leaf_length_mean      0.008491         NA      NA       NA
petals_length_mean    0.091623         NA      NA       NA
silicua_length_mean  -0.036001         NA      NA       NA
bloom_length         -0.844697         NA      NA       NA
categ_color[T.2]     -2.420793         NA      NA       NA
categ_color[T.3]      1.268585         NA      NA       NA
categ_color[T.4]      1.049953         NA      NA       NA

Threshold coefficients:
    Estimate Std. Error z value
1|3   -1.171         NA      NA
3|4    1.266         NA      NA
4|5    1.800         NA      NA
(4 observations deleted due to missingness)

I tried with no random effects and excluding the rows with NAs but it's the same.

The structure of my data:

str(brasic1)
tibble[,13] [80 x 13] (S3: tbl_df/tbl/data.frame)
 $ ID                 : num [1:80] 135 137 142 145 287 295 585 593 646 656 ...
 $ Genero             : chr [1:80] "Alyssum" "Alyssum" "Alyssum" "Alyssum" ...
 $ Cons.stat          : chr [1:80] "LC" "VU" "VU" "LC" ...
 $ Amenazada          : num [1:80] 0 1 1 0 1 0 0 1 0 0 ...
 $ Grado_amenaza      : Factor w/ 5 levels "1","3","4","5",..: 1 2 2 1 4 1 1 2 1 1 ...
 $ Life_Form          : chr [1:80] "Th" "Hem" "Hem" "Th" ...
 $ size_max_cm        : num [1:80] 12 6 7 15 20 27 60 62 50 60 ...
 $ leaf_length_mean   : num [1:80] 7.5 7 11 14.5 31.5 45 90 65 65 39 ...
 $ petals_length_mean : num [1:80] 2.2 3.5 5.5 2.55 6 8 10.5 9.5 9.5 2.9 ...
 $ silicua_length_mean: num [1:80] 3.5 4 3.5 4 25 47.5 37.5 41.5 17.5 2.9 ...
 $ X2n                : num [1:80] 32 NA 16 16 NA NA 20 20 18 14 ...
 $ bloom_length       : num [1:80] 2 1 2 2 2 2 2 2 11 2 ...
 $ categ_color        : chr [1:80] "1" "4" "4" "4" ...
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Ijimm
  • 1
  • Please make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Martin Gal Jul 25 '21 at 21:09

1 Answers1

2

For a full answer we really need a reproducible example, but I can point to a few things that raise suspicions.

The fact that you can get estimates, but not standard errors, implies that there is something wrong with the Hessian (the estimate of the curvature of the log-likelihood surface at the maximum likelihood estimate), but there are several distinct (possibly overlapping possibilities)

  • any time you have a "large" parameter estimate (say, absolute value > 10), as in your example (Life_Form[T.Hyd] = -22.632916), it suggests complete separation, i.e. the presence/absence of that parameter perfectly predicts the response. (You can search for that term, e.g. on CrossValidated.) However, complete separation usually leads to absurdly large standard errors (along with the large parameter estimates) rather than to NAs.
  • you may have perfect multicollinearity, i.e. combinations of your predictor variables that are perfectly (jointly) correlated with other such combinations. Some R estimation procedures can detect and deal with this case (typically by dropping one or more predictors), but clmm might not be able to. (You should be able to construct your model matrix (X <- model.matrix( your_formula, your_data), excluding the random effect from the formula) and then use caret::findLinearCombos(X) to explore this issue.)
  • More generally, if you want to do reliable inference you may need to cut down the size of your model (not by stepwise or other forms of model selection); a rule of thumb is that you need 10-20 observations per parameter estimated. You're trying to estimate 12 fixed effect parameters plus a few more (ordinal-threshold parameters and random effect variance) from 80 observations ...

In addition to dropping random effects, it may be useful to a diagnosis to fit a regular linear model with lm() (which should tell you something about collinearity, by dropping parameters) or a binomial model based on some threshold grade values (which might help with identifying complete separation).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453