1

My dataframe "MyDataRisk" has 36692 rows for 1 binomial response variable (Risk), and 3 continuous + 1 categorical variables. Id defines study site identity. Here is a summary:

> summary(MyDataRisk)
   id                         Landscape        Road_width        Risk      
Min.   :  1.00   Forest       : 7214   Min.   :3.800   Min.   :0.0000  
1st Qu.: 11.00   Double hedge : 4955   1st Qu.:5.500   1st Qu.:0.0000  
Median : 31.00   Simple hedge : 3490   Median :6.000   Median :0.0000  
Mean   : 40.92   Perp_Hedge   : 15433  Mean   :6.005   Mean   :0.1875  
3rd Qu.: 66.00   Edge         : 4020   3rd Qu.:6.400   3rd Qu.:0.0000  
Max.   :112.00   No_vegetation: 1580   Max.   :7.700   Max.   :1.0000  
Vegetation_height       Vegetation_Distance     
Min.   :-2.17260   Min.   :-1.32359  
1st Qu.:-0.54750   1st Qu.:-0.82262  
Median :-0.08318   Median : 0.04941  
Mean   : 0.00000   Mean   : 0.00000  
3rd Qu.: 0.61329   3rd Qu.: 1.04935  
Max.   : 2.70271   Max.   : 1.74702 

I used the following glmmPQL to model my response variable:

Mod1 <- glmmPQL(Risk ~ Vegetation_height+Road_width+Vegetation_Distance+Landscape
                              +Vegetation_height:Road_width
                              +Vegetation_height:Vegetation_Distance
                              +Vegetation_height:Landscape
                              +Road_width:Vegetation_Distance
                              +Road_width:Landscape
                              +Vegetation_Distance:Landscape,
                              data=MyDataRisk,
                              family = binomial, random = ~ 1|id)

And I obtain the following error message:

iteration 1
Error in MEEM(object, conLin, control$niterEM) : 
Singularity in backsolve at level 0, block 1

Now the value "No_vegetation" of the categorical variable "Landscape" has no variability in "Vegetation height" and "Vegetation distance". Which makes sense. The problem could come from here. If I choose to remove observations for "No_vegetation" the model works fine. However I would like to keep these observations since they are important in my design, and test interactions for all values of "Landscape" but for "No_vegetation".

I know that it is possible to modify the design matrix of the model, however I could not manage to understand if it would work in my case, and how to do it...

  • You might need to define No_vegetation as a dummy variable (i.e. 1 = has vegetation, 0 = no vegetation). It could be an issue of perfect multicollinearity, because as you say there is no variability in the explanatory values Vegetation height, and vegetation distance where Landscape = No_vegetation. – Mako212 Jul 19 '17 at 15:47
  • Thank you for your answer. However I have troubles understanding how to use that because I feel that it is redundant information: should I just add a new variable (1 = has vegetation, 0 = no vegetation) to my dataset ? And then how should I include this new variable in the model formula ? – Charlotte R Jul 20 '17 at 08:26
  • Can you provide a small sample of your data (see [How to Make a Great Reproducible Example](https://stackoverflow.com/a/5963610/4421870)) – Mako212 Jul 20 '17 at 15:19

0 Answers0