Contrast or deviation coding v. penalties: how to get coefficient estimates for all factor levels, compared to the grand mean?

Question

I'm trying to do exactly as this post here: Comparing all factor levels to the grand mean: can I tweak contrasts in linear model fitting to show all levels? But for a GAM with multiple categorical factors that I want to use contrast coding for. I can't get the solution (the transformation matrix) to work for the more complex example. I'm using contrast coding so that each level of the two factors I'm interested in is compared ot the grand mean, instead of a random reference level (which seems to always be the first level). As with the existing post, I want to:

Maintain the reported factor level names (instead of them being assigned sequential numbers, e.g. I want levelA levelB levelC instead of level1, level2, level3)
Calculate the regression coefficients for all the levels (instead of N-1 levels) for TWO categorical factors coded with contrast coding

I understand how to work backwards from the intercept in the simple example, but I'm confused how the grand mean works in the case of multiple factors.

APPROACH 1)

# this is for count data, so showing an example with a poisson distribution and multiple factors - otherwise following @ZheyuanLi's answer

set.seed(123)
y <- rpois(12, lambda=3)
x <- rpois(12, lambda=0.5)
f1 <- factor(rep(LETTERS[1:3], each = 4))
f2 <- factor(rep(LETTERS[4:5], each = 6))
fit <- gam(y ~ x + f1 + f2, contrasts = list(f1 = contr.sum, f2=contr.sum))
# I'm using a gam but I think it would be the same process for a lm
summary(fit)

ContrSumMat <- function (fctr, sparse = FALSE) {
  if (!is.factor(fctr)) stop("'fctr' is not a factor variable!")
  N <- nlevels(fctr)
  Cmat <- contr.sum(N, sparse = sparse)
  dimnames(Cmat) <- list(levels(fctr), seq_len(N - 1))
  Cmat
}

Cmat1 <- ContrSumMat(f1)
Cmat1
Cmat2 <- ContrSumMat(f2)
Cmat2

coef(fit)
## coefficients After Contrasts
coef_after <- coef(fit)[3:4]
# f1B       f1C 
# -1.000000 -2.676471

coef_before <- (Cmat %*% coef_after)[, 1]
# But because it's not just f1B, f1C, and the Intercept, this doesn't get me f1A

APPROACH 2) There's another answer to the same question (but also a simpler model) that works by explicitly suppressing the intercept to get hte estimate of the last level. How to change contrasts to compare with mean of all levels rather than reference level (R, lmer)? same approach here, also for a linear model: A linear model matrix where each level of a categorical is contrasted with the mean

But why does this change the coefficient estimates, and even the sign of the estimates? I don't care about the absolute value of the estimates, but I do care about each level's relative value compared to the other levels, and whether it's positive or negative.

set.seed(101)
w <- c("Monday", "Tuesday", "Wednesday", "Thursday", 
       "Friday", "Saturday", "Sunday")
dd <- data.frame(w=factor(rep(w,10),levels=w),y=rnorm(70))

m0 <- gam(y~w, data=dd, contrasts=list(w=contr.sum))
summary(m0)

## suppress the intercept
m1 <- gam(y~w-1, data=dd, contrasts=list(w=contr.sum))
summary(m1) # now you get an estiamte for each level, and no intercept. But very different values

Finally, the answer to the following post makes it sound like what I want isn't possible with contrasts, and I need to instead put a sum-to-zero penalty on each factor?

`lm` summary not display all factor levels

At the bottom: "If you really want to have all coefficients there, use constrained least squares, or penalized regression / linear mixed models."

Contrast or deviation coding v. penalties: how to get coefficient estimates for all factor levels, compared to the grand mean?

0 Answers0