R ttest inside ddply gives error "grouping factor must have exactly 2 levels"

Question

I have a dataframe with several factors and two phenotypes

freq sampleID status score snpsincluded
0.5 0001 case 100 all 
0.2 0001 case 30 all 
0.5 0002 control 110 all 
0.5 0003 case 100 del 
etc

I would like to do a t.test comparing cases and controls for each set of relevant factors. I have tried the following:

o2 <- ddply(df, c("freq","snpsincluded"), summarise, pval=t.test(score, status)$p.value)

but it complains that " grouping factor must have exactly 2 levels"

I have no missing values, NAs, and Ive checked:

levels(df$status)
[1] "case"    "control"

Am I missing something stupid? Thanks!

agstudy · Accepted Answer · 2014-01-15T22:17:05.080

You get an error because , you get a for at least one sub-group , unique status value for all score's.

This reproduce the error, the status is unique (equal to 1) for all scores.

dx = read.table(text='   score status
1 1 1 
2 2 1 
3 3 1 ')

t.test(score ~ status, data = dx) 
Error in t.test.formula(score ~ status, data = dx) : 
  grouping factor must have exactly 2 levels

this correct the problem but create another known problem with t.test, you should have enough observations( I think >= 2):

dx = read.table(text='   score status
1 1 1 
2 2 1 
3 3 2 ')

t.test(score ~ status, data = dx) 
Error in t.test.default(x = 1:2, y = 3L) : not enough 'y' observations

Finally this correct all the problems:

dx = read.table(text='   score status
1 1 1 
2 2 1 
3 3 2 
4 4 2')

t.test(score ~ status, data = dx) 

Welch Two Sample t-test

data:  score by status
t = -2.8284, df = 2, p-value = 0.1056
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5.042435  1.042435
sample estimates:
mean in group 1 mean in group 2 
            1.5             3.5

EDIT I explain the problem without giving a solution, because you don't give a reproducible example.

one solution is do computation only for good groups:

  ddply(df, c("freq","snpsincluded"), function(x)
      { 
       if(length(unique(x$status)==2)
         pval=t.test(score~status,data=x)$p.value
     })

Thanks for your quick answer! I'm not sure I understand "you get a unique status value for all score's" though: I have about 700 lines for both cases and controls. Also, I should add, I have a few more factors I left out for clarity, but those are the reason I do not like to do t-test on data subsets for each combination. If I do ttest of the whole dataset (without considering factors) it works. Again, thanks for help. — madieke, Jan 15 '14 at 22:05
@madieke I add some comments to clarify this. Better to see my example 's. — agstudy, Jan 15 '14 at 22:10

R ttest inside ddply gives error "grouping factor must have exactly 2 levels"

1 Answers1

Linked