I am currently analyzing the occupancy of bat boxes and the factors that are influencing the occupancy. To determine the most significant variables influencing the occupancy I am running a glm with occupancy as my response variable (0=occupied / 1=not occupied) and different explanatory variables which are numerical except one categorical variable (with 4 levels Bat box mounted on tree/pole/balcony/facade).
my code is:
modelb <- glm(occupation ~ TreeCov + distance_to_water + mounted_on,
family = binomial(link="cloglog"), data = mydata)
In the results I get:
coefficient p value
TreeCov 0.0344
distance_to_water 0.1291
mounted_onTREE 0.7676
mounted_onFACADE 0.4319
mounted_onPOLE 0.0770
with
mounted_on <- relevel(mounted_on, ref="Tree")
the reference is changed from balcony to tree and when I run the model I get different p values for my dummy variables.
coefficient p value
TreeCov 0.0344
distance_to_water 0.1291
mounted_onBALCONY 0.45272
mounted_onFACADE 0.0122
mounted_onPOLE 0.02661
How do I choose which dummy variable should be my reference?