In this example, the lm function in R finds a very small correlation between two columns that ought to be systematically uncorrelated. (also, this only happens when asked to predict one way, but not the other). Is it a rounding error? This becomes a big issue when trying to use lm.cluster, which turns this rounding error into a nearly significant effect.
## why does this happen? ##
library(miceadds)
id = c(1, 1, 1, 1, 2, 2, 2, 2)
a <- c(5, 5, 5, 5, 1, 1, 1, 1)
b <- c(-0.5, 0.5, -0.5, 0.5, -0.5, 0.5, -0.5, 0.5)
df <- data.frame(id, a, b)
df
reg <- lm(data = df, b ~ a)
## no correlation
summary(reg)
# Call:
# lm(formula = b ~ a, data = df)
# Residuals:
# Min 1Q Median 3Q Max
# -0.5 -0.5 0.0 0.5 0.5
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -7.494e-17 3.680e-01 0 1
# a 2.475e-17 1.021e-01 0 1
# Residual standard error: 0.5774 on 6 degrees of freedom
# Multiple R-squared: 2.696e-32, Adjusted R-squared: -0.1667
# F-statistic: 1.618e-31 on 1 and 6 DF, p-value: 1
reg <- lm(data = df, a ~ b)
## miniscule correlation
summary(reg)
# Call:
# lm(formula = a ~ b, data = df)
# Residuals:
# Min 1Q Median 3Q Max
# -2 -2 0 2 2
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.000e+00 8.165e-01 3.674 0.0104 *
# b 2.183e-16 1.633e+00 0.000 1.0000
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 2.309 on 6 degrees of freedom
# Multiple R-squared: 9.861e-32, Adjusted R-squared: -0.1667
# F-statistic: 5.916e-31 on 1 and 6 DF, p-value: 1
cluster_reg <- lm.cluster(data = df, a ~ b, cluster = "id")
summary(cluster_reg) ## nearly significant effect?!
The coefficient in all 3 regressions ought to be exactly 0, but for me the second and clustered regressions yield coefficients of 6.28e-16. Is this error unique to me? What could cause this, and how can I analyze data with this structure in a way that avoids this issue?