My guess is that the author (who says "the NAs imply that the found coefficients are 0, but the NA-coefficient variables are still acting as controls over the model") is wrong (although it's hard to be 100% sure without having the full context).
The problem is almost certainly that you have some multicollinear predictors. The reason that different variables get dropped/have NA
coefficients returned is that R partly uses the order to determine which ones to drop (as far as the fitted model result goes, it doesn't matter - all of the top-level results (predictions, goodness of fit, etc.) are identical).
In comments the OP says:
The relationship between log_a
and log_gm_a
is that this is a multiplicative fixed-effects model. So log_a
is the log of predictor a
. log_gm_a
is the log of the geometric mean of a
. So each of the log_gm
terms is constant across all observations.
This is the key information needed to diagnose the problem. Because the intercept is excluded from this model (the formula contains 0+
, having one constant column in the model matrix is OK, but multiple constant columns is trouble; all but the first (in whatever order is specified by the formula) will be discarded. To go slightly deeper: the model requested is
Y = b1*C1 + b2*C2 + b3*C3 + [additional terms]
where C1
, C2
, C3
are constants. At the point in "data space" where the additional terms are 0 (i.e. for cases where log_a = log_b = log_c = ... = 0
), we're left with predicting a constant value from three separate constant terms. Suppose that the intercept in a regular model (~ 1 + log_a + log_b + log_c
) would have been m
. Then any combination of (b1, b2, b3)
that makes the sum equal to zero (and there are infinitely many) will fit the data equally well.
I still don't know much about the context, but it might be worth considering adding the constant terms as offsets in the model. Or scale the predictors by their geometric means/subtract the log-geom-means from the predictors?
In other cases, multicollinearity arises from unidentifiable interaction terms; nested variables; attempts to include all the levels of multiple categorical variables; or including the proportions of all levels of some compositional variable (e.g. proportions of habitat types, where the proportions add up to 1) in the model, e.g.