0

I've searched endlessly for a solution to my problem to no avail. I am doing a simple calculation of the intraclass correlation which in this case is an estimate of heritability. I'd like to be able to repeat the calculation over 12,000+ individual genes. Here is the small data frame that I've been using to get the thing working:

Strain_ID   ENS001  ENS056  ENS058
5   3.06928082  3.038645597 2.985282543
5   2.868997097 2.666932055 2.793392732
5   3.235929871 2.90516985  2.630776507
7   3.002625449 2.868878032 2.363580624
7   3.150054756 2.881093606 2.474916595
7   3.138522184 2.693864389 2.490961619

Where Strain_ID is a factor and ENS* are identifiers for individual genes. the df is called "h2_saline". When I run the following, I get the appropriate answer:

exp.lm <- lm(ENS001 ~ Strain_ID, data = h2_saline)
library(car)
Anova(exp.lm)
Var.within <- Anova(exp.lm)["Residuals", "Sum Sq"]
Var.between <- Anova(exp.lm)["Strain_ID", "Sum Sq"]
h2 <- (Var.between - Var.within) / (Var.between + (2 * Var.within))

Which is:

> Var.within
[1] 0.08095382
> Var.between
[1] 0.002281289
> h2
[1] -0.4791586

(Note: see http://jeromyanglim.blogspot.com/2013/12/using-r-to-replicate-common-spss.html for rationale for using the car package if you're interested.)

However, when I use a variable ("x") that is meant to take the value of each column header so that I can loop through each column in the df, I get an error message. For example:

assign("x", "ENS056")

exp.lm <- lm(x ~ Strain_ID, data = h2_saline)
library(car)
Anova(exp.lm)
Var.within <- Anova(exp.lm)["Residuals", "Sum Sq"]
Var.between <- Anova(exp.lm)["Strain_ID", "Sum Sq"]
h2 <- (Var.between - Var.within) / (Var.between + (2 * Var.within))

I get this output:

> assign("x", "ENS056")
> 
> exp.lm <- lm(x ~ Strain_ID, data = h2_saline)
Error in model.frame.default(formula = x ~ Strain_ID, data=h2_saline,    : 
  variable lengths differ (found for 'Strain_ID')
> library(car)
> Anova(exp.lm)
Anova Table (Type II tests)

Response: ENS001
            Sum Sq Df F value Pr(>F)
Strain_ID 0.002281  1  0.1127 0.7539
Residuals 0.080954  4               
> Var.within <- Anova(exp.lm)["Residuals", "Sum Sq"]
> Var.between <- Anova(exp.lm)["Strain_ID", "Sum Sq"]
> h2 <- (Var.between - Var.within) / (Var.between + (2 * Var.within))
> 

As you can see, it outputs the results of the first variable that I explicitly entered, not what I wanted which was the next column in the df. I've tried assigning "ENS056" in a variety of ways and still get the same result.

Any help would be greatly appreciated.

  • I do not think assign("x", "ENS056") is what you want to do. That creates a new variable x with the string value of "ENS056". – Raad May 06 '16 at 16:50
  • In a few moment, myself or someone else will find a suitable duplicate that will show you how to properly construct model formulas. – joran May 06 '16 at 17:06
  • I'll find a few more of the several dozen duplicates floating around for further context... – joran May 06 '16 at 17:08
  • e.g. [here](http://stackoverflow.com/q/9238038/324364), or [here](http://stackoverflow.com/a/30265548/324364), or... – joran May 06 '16 at 17:11

0 Answers0