I've searched endlessly for a solution to my problem to no avail. I am doing a simple calculation of the intraclass correlation which in this case is an estimate of heritability. I'd like to be able to repeat the calculation over 12,000+ individual genes. Here is the small data frame that I've been using to get the thing working:
Strain_ID ENS001 ENS056 ENS058
5 3.06928082 3.038645597 2.985282543
5 2.868997097 2.666932055 2.793392732
5 3.235929871 2.90516985 2.630776507
7 3.002625449 2.868878032 2.363580624
7 3.150054756 2.881093606 2.474916595
7 3.138522184 2.693864389 2.490961619
Where Strain_ID is a factor and ENS* are identifiers for individual genes. the df is called "h2_saline". When I run the following, I get the appropriate answer:
exp.lm <- lm(ENS001 ~ Strain_ID, data = h2_saline)
library(car)
Anova(exp.lm)
Var.within <- Anova(exp.lm)["Residuals", "Sum Sq"]
Var.between <- Anova(exp.lm)["Strain_ID", "Sum Sq"]
h2 <- (Var.between - Var.within) / (Var.between + (2 * Var.within))
Which is:
> Var.within
[1] 0.08095382
> Var.between
[1] 0.002281289
> h2
[1] -0.4791586
(Note: see http://jeromyanglim.blogspot.com/2013/12/using-r-to-replicate-common-spss.html for rationale for using the car package if you're interested.)
However, when I use a variable ("x") that is meant to take the value of each column header so that I can loop through each column in the df, I get an error message. For example:
assign("x", "ENS056")
exp.lm <- lm(x ~ Strain_ID, data = h2_saline)
library(car)
Anova(exp.lm)
Var.within <- Anova(exp.lm)["Residuals", "Sum Sq"]
Var.between <- Anova(exp.lm)["Strain_ID", "Sum Sq"]
h2 <- (Var.between - Var.within) / (Var.between + (2 * Var.within))
I get this output:
> assign("x", "ENS056")
>
> exp.lm <- lm(x ~ Strain_ID, data = h2_saline)
Error in model.frame.default(formula = x ~ Strain_ID, data=h2_saline, :
variable lengths differ (found for 'Strain_ID')
> library(car)
> Anova(exp.lm)
Anova Table (Type II tests)
Response: ENS001
Sum Sq Df F value Pr(>F)
Strain_ID 0.002281 1 0.1127 0.7539
Residuals 0.080954 4
> Var.within <- Anova(exp.lm)["Residuals", "Sum Sq"]
> Var.between <- Anova(exp.lm)["Strain_ID", "Sum Sq"]
> h2 <- (Var.between - Var.within) / (Var.between + (2 * Var.within))
>
As you can see, it outputs the results of the first variable that I explicitly entered, not what I wanted which was the next column in the df. I've tried assigning "ENS056" in a variety of ways and still get the same result.
Any help would be greatly appreciated.