I'm attempting to run a regression (lm) over groups of data (counties) in a data frame. However, i first want to filter that data frame (dat) to exclude some groups with too few data points. I get get everything to work fine as long as i don't subset the data frame first:
tmp1 <- with(dat,
by(dat, County,
function(x) lm(formula = Y ~ A + B + C, data=x)))
sapply(tmp1, function(x) summary(x)$adj.r.squared)
i get back as expected:
Barrow Carroll Cherokee Clayton Cobb Dekalb Douglas
0.00000 NaN 0.61952 0.69591 0.48092 0.61292 0.39335
However, when i first subset the data frame:
dat.counties <- aggregate(dat[,"County"], by=list(County), FUN=length)
good.counties <- as.matrix(subset(dat.counties, x > 20, select=Group.1))
dat.temp <- dat["County" %in% good.counties,]
and then run the same code:
tmp2 <- with(dat,
by(dat, County,
function(x) lm(formula = Y ~ A + B + C, data=x)))
sapply(tmp2, function(x) summary(x)$adj.r.squared)
i get the following error: " $ operator is invalid for atomic vectors". If i then run
summary(tmp2)
I see the following:
Length Class Mode
Barrow 0 -none- NULL
Carroll 0 -none- NULL
Cherokee 12 lm list
Clayton 12 lm list
the sapply is obviously bombing out on the Class -none- objects. But those are specifically the ones i had excluded above! How are they still showing up in my new data frame?!
Thank you for any enlightenment.