-2

I'm performing t-tests and I'm getting this error " grouping factor must have exactly 2 levels". it's concerning another data set. Do you know what "levels" is referring to?

unfortunately the data is too big to transfer here (3x272rows)

vicky
  • 29
  • 1
  • 1
  • 4
  • Have a [look at this](http://stackoverflow.com/questions/29421475/basic-t-test-grouping-factor-must-have-exactly-2-levels) – Sotos May 23 '16 at 09:43
  • 3
    In addition to @Sotos, for future questions please try to provide a minimal dataset (e.g. with `dput(head(df))` where `df` is your data set; this gives us 6 rows of the data) along with code that reproduces the error that you get (if it doesn't work on the minimal dataset, try using `head(df, 100)` for example). I presume that that's why your question was downvoted, as it is hard to help someone with just an error message. – slamballais May 23 '16 at 09:47
  • Thanks for helping me understand how this forum works. So this is the result of dput structure(list(schoolid = c(1L, 1L, 1L, 1L, 1L, 1L), score = c(0L, 10L, 0L, 40L, 42L, 4L), gender = c(0, 0, 0, 0, 0, 0)), .Names = c("schoolid", "score", "gender"), row.names = c(NA, 6L), class = "data.frame") In the meantime I read the link that Sotos suggested and changed the delta sign to a comma and I got back another error message, this time "error in t.test(score, schoolid) : object 'score' not found" – vicky May 23 '16 at 09:57
  • This doesn't seem to be a programming question at all.Instead, it seems to be a general R usage question, which is thus off-topic on Stack Overflow. – John Coleman May 23 '16 at 10:12
  • Very sorry. Didn't know stack flow was specifically for programming ... – vicky May 23 '16 at 11:16
  • 1
    In your defense, the distinction between using R and programming in R is a bit fuzzy. The `r` tag suggests this for more straight statistics questions: http://stats.stackexchange.com . I haven't used that site so I am not quite sure if your question would be on-topic there, though it is likely to be a better fit. – John Coleman May 23 '16 at 11:58

2 Answers2

4

This happens because you have more than 2 unique values (levels) of the schoolid.

For example, this code reproduces your problem:

n <- 10
dat <- data.frame(
    schoolid = sample(3, n, replace = TRUE), 
    score = runif(n, 0, 100) 
)


t.test(score ~ schoolid, data = dat)
Error in t.test.formula(score ~ schoolid, data = dat) : 
  grouping factor must have exactly 2 levels

A t-test compares the means between two groups only. If you have more than two groups, you'll have to follow a different strategy. For example, compare one group to all other groups. In the next example, you compare the group with schoolid == 1 to all other groups:

t.test(score ~ schoolid == 1, data = dat)

    Welch Two Sample t-test

data:  score by schoolid == 1
t = 0.55568, df = 17.757, p-value = 0.5854
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -14.02586  24.10004
sample estimates:
mean in group FALSE  mean in group TRUE 
           51.51903            46.48194 

You may also want to consider using a different test altogether. For example, a linear model can handle multiple sub groups:

model <- lm(score ~ factor(schoolid) - 1, data = dat)
summary(model)

Call:
lm(formula = score ~ factor(schoolid) - 1, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-44.416 -18.396  -5.337  23.672  45.752 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
factor(schoolid)1   46.482      7.772   5.981 2.88e-07 ***
factor(schoolid)2   50.309      6.176   8.146 1.55e-10 ***
factor(schoolid)3   52.729      6.176   8.537 4.07e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 26.92 on 47 degrees of freedom
Multiple R-squared:  0.7883,    Adjusted R-squared:  0.7748 
F-statistic: 58.34 on 3 and 47 DF,  p-value: 7.084e-16
Andrie
  • 176,377
  • 47
  • 447
  • 496
1

When doing a t-test, you compare two groups. How many possible values does school_id have? If it's not two, you have an explanation of your error. In that case you should look into other tests, e.g. ANOVA. Good luck!

Jasper
  • 555
  • 2
  • 12