3

I would like to use the t.test function to compare groups of values stored in a dataframe. Let say my dataframe has 2 columns : "group" and "result" and 40 lines. The "result" column contains the values I want to compare, and the "group" column indicates the groups in which the values are divided : for example 4 groups (a,b,c,d) of 10 values each.

How can I indicate that I only want to test the values belonging to group a versus the values belonging to group b ?

Alternatively, is there a simple way to extract the values belonging to the group a into a vector (let's call it "vecta") in order to compare the vectors at will ?

Thanks in advance ! Seb

Seb Matamoros
  • 339
  • 2
  • 4
  • 14
  • 4
    Welcome to Stack Overflow. Post some sample data and you would most likely be very surprised at how much more quickly someone is going to be able to reply with an answer that you find helpful. – A5C1D2H2I1M1N2O1R2T1 Sep 26 '13 at 15:21
  • [Here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) you can find some hints to accomplish what Ananda just said – Jilber Urbina Sep 26 '13 at 15:24

4 Answers4

8

You asked: "How can I indicate that I only want to test the values belonging to group a versus the values belonging to group b ?"

Suppose your data frame is called df. To compare group a with group b with t.test you can use e.g:

t.test(df$result[df$group=="a"], df$result[df$group=="b"])
# or
with(df, t.test(result[group=="a"], result[group=="b"]))
# or, fo rexample
t.test(result~group, data=subset(df, group %in% c("a", "b")))

All approaches should work but are untested as you didn't post any example data:P

"Alternatively, is there a simple way to extract the values belonging to the group a into a vector (let's call it "vecta") in order to compare the vectors at will ?"

Yes,

df$value[def$group=="a"]  # result is a vector
lebatsnok
  • 6,329
  • 2
  • 21
  • 22
2

There's an example for your situation at the end of the help on t.test:

## Classical example: Student's sleep data
plot(extra ~ group, data = sleep)
## Traditional interface
with(sleep, t.test(extra[group == 1], extra[group == 2]))
## Formula interface
t.test(extra ~ group, data = sleep)

The second form (the formula interface) is the easiest when you have two groups; you have 4.

One way you could do it (let's say your data frame is called yourdata) would be

with(yourdata, t.test(result[group == "a"], result[group == "b"]))

As for extracting the values where the group indicator takes a particular value, the way to do that is given in the first form in the help above (in your case with(yourdata, result[group=="a"]) would give you just the results for group a).

Glen_b
  • 7,883
  • 2
  • 37
  • 48
  • 1
    +1. One suggestion: I'd use `group %in% "a"` instead of `==` to avoid selecting records with NA group values. For example: `group <- c("a", "b", NA); group[group == "a"]; group[group %in% "a"]`. This is a scenario when that's unlikely, but using `%in%` as my default has helped me avoid a lot of headaches over the years. – Matt Parker Sep 26 '13 at 15:45
  • 1
    @MattParker +1 It's an excellent point (though NAs won't hurt the example here). My reason for preferring `==` was to simply encourage using the information in the examples in help, and that's how the relevant example was done under `?t.test` – Glen_b Sep 26 '13 at 15:50
1

The following is not "at will", but rather, automatic calculation of all pairs of "group" variables.

Here's some sample data:

mydf <- data.frame(
  group = rep(letters[1:4], each = 10),
  result = c(1:10, 5:14, 11:20, 15:24)
)
mydf

You can use combn to create the "pairs" of each group to use t.test on.

combn(as.character(unique(mydf$group)), 2, 
      FUN = function(y) t.test(result ~ group, 
                               mydf[mydf$group %in% y ,]), 
      simplify = FALSE)

As for extracting separate vectors, I think that a list of vectors might be more convenient, for which you can use split:

x <- split(mydf$result, mydf$group)
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1
with(subset(df, group %in% c("a", "b")),
     t.test(value ~ factor(group))
#
# df - your data.frame
#
Wojciech Sobala
  • 7,431
  • 2
  • 21
  • 27