I'm encountering an unexpected lack of correspondence between the estimated effects of an unpaired t.test()
and lm()
in base R's stats. While an independent-samples t-test correctly yields t = 0
and a manual calculation of OLS slope also outputs an estimate of 0, lm()
gives a near-zero approximation (specifically, -3.239e-14
).
This pattern holds across machines and OS (Mac and Windows) versions as tested by my coworkers. Would anyone be able to explain the discrepancy? The most similar post does not seem to apply here.
# Populating an example dataframe
id <- rep(1, 16)
dose <- rep(c(rep("Low", 4), rep("High", 4)), 2)
domain <- c(rep("Domain1", 8), rep("Domain2", 8))
dv <- c(100, 0, 100, 100, 0, 100, 100, 100, 0, 0, 0, 0, 0, 100, 50, 0)
df <- data.frame(id, dose, domain, dv)
library(dplyr) ## needed for %>% and filter()
# Independent-samples t-test
t.test(dv ~ dose,
data = filter(df, domain == "Domain1") %>% droplevels(),
paired = F)
# data: dv by dose
# t = 0, df = 6, p-value = 1
# Linear regression of factor IV and numeric DV
summary(lm(as.numeric(dv) ~ as.factor(dose),
data = filter(df, domain == "Domain1")))
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 7.500e+01 2.500e+01 3 0.024 *
# as.factor(dose)Low -3.239e-14 3.536e+01 0 1.000
# Manual calculation of OLS slope
X <- model.matrix(dv ~ dose, data = filter(df, domain == "Domain2"))
y <- with(filter(df, domain == "Domain1"), dv)
R <- t(X) %*% X
solve(R) %*% t(X) %*% y
# [,1]
# (Intercept) 75
# doseLow 0