11

I am relatively new to R. For my assignment I have to start by conducting a T-Test by looking at the effect of a politician's (Conservative or Labour) wealth on their real gross wealth and real net wealth. I have to attempt to estimate the effect of serving in office wealth using a simple t-test.

The dataset is called takehome.dta

Labour and Tory are binary where 1 indicates that they serve for that party and 0 otherwise.

The variables for wealth are lnrealgross and lnrealnet.

I have imported and attached the dataset, but when I attempt to conduct a simple t-test. I get the following message "grouping factor must have exactly 2 levels." Not quite sure where I appear to be going wrong. Any assistance would be appreciated!

Adam Huffman
  • 1,031
  • 14
  • 21
Chris Thwaites
  • 113
  • 1
  • 1
  • 6
  • 1
    Please add sample data and show your code (see [these guidelines for making a reproducible example](http://stackoverflow.com/a/28481250/215487). – Christopher Bottoms Apr 02 '15 at 20:36

3 Answers3

18

are you doing this:

t.test(y~x)

when you mean to do this

t.test(y,x)

In general use the ~ then you have data like

y <- 1:10
x <- rep(letters[1:2], each = 5)

and the , when you have data like

y <- 1:5
x <- 6:10

I assume you're doing something like:

y <- 1:10
x <- rep(1,10)
t.test(y~x) #instead of t.test(y,x)

because the error suggests you have no variation in the grouping factor x

user1317221_G
  • 15,087
  • 3
  • 52
  • 78
5

The differences between ~ and , is the type of statistical test you are running. ~ gives you the mean differences. This is for dependent samples (e.g. before and after). , gives you the difference in means. This is for independent samples (e.g. treatment and control). These two tests are not interchangeable.

  • 1
    This is incorrect. The switch `paired` in the function call is what achieves the dependent/independent samples distinction, it's unrelated to whether you indicate your samples through two numeric vectors `t.test(x, y, paired=T)` or through a longer vector with a factor vector with two levels `df <- data.frame(z = c(x,y), f = rep("a", "b", each = length(x))); t.test("z ~ f", paired = T, data = df)`. See this [sthda easy guide on the topic](http://www.sthda.com/english/wiki/paired-samples-t-test-in-r) – Fons MA Apr 11 '21 at 22:59
1

I was having a similar problem and did not realize given the size of my dataset that one of my y's had no values for one of my levels. I had taken a series of gene readings for two groups and one gene had readings only for group 2 and not group 1. I hadn't even noticed but for some reason this presented with the same error as what I would get if I had too many levels. The solution is to remove that y or in my case gene from my analysis and then the error is solved.

Maya Gough
  • 11
  • 1