Your problem is (as the comments are roaring) that you are lacking memory to perform your calculations. Another very important point is why you want to perform your regression?
With a OLS that only includes a single factor variable (dummy variable) with multiple levels, what you are actually estimating is the group mean, in this case the mean of y
for each name
. Most LS implementations use a QR
decomposition and creates a contrast-design matrix, meaning the intercept is the mean of the first group, while the other coefficients is the difference mean different to the intercept. This is the case in R
's lm
function. But we can still get the coefficients out, calculate R-squared etc if we really wanted. For illustration here is an example using the mtcars
dataset
data(mtcars)
fit <- lm(mpg ~factor(cyl), data = mtcars)
coefs <- tapply(mtcars$mpg, mtcars$cyl, mean)
intercept <- coefs[1]
beta <- c(intercept, coefs[-1] - intercept)
names(beta) <- c("(Intercept)", paste0("cyl", levels(factor(mtcars$cyl))[-1]))
beta
#output
(Intercept) cyl4 cyl8
26.663636 -6.920779 -11.563636
coef(fit)
#output
(Intercept) factor(cyl)6 factor(cyl)8
26.663636 -6.920779 -11.563636
#output
all.equal(coef(fit), out, check.attributes = FALSE)
[1] TRUE
R-squared is calculated likewise.
But again what do you really want to estimate? In this case a linear regression is a bit overkill.
Edit R-squared
Note that R squared can be calculated simply using the relation Rsquared = 1 - SSE / SST = SSF / SST
. In the case of a single factor SSF = var(fitted)
, and always SST = var(predictor)
, so rsquared can be achieved as
fitted <- ave(mtcars$mpg, mtcars$cyl, FUN = mean)
ssf <- var(fitted)
sst <- var(mtcars$mpg)
r2 <- ssf / sst
all.equal(r2, summary(fit)$r.squared)
[1] TRUE