-1

First, is this dataset in a tidy form for a t-test?

https://i.stack.imgur.com/tMK6R.png

Second, I'm trying to do a two sample t-test to compare the means at time 3 of treatment a and b for 'outcome 1'. How would I go about doing this?

Sample data:

df <- structure(list(code = c(100, 100, 100, 101, 101, 101, 102, 102, 
      102, 103, 103, 103), treatment = c("a", "a", "a", "b", "b", "b", 
      "a", "a", "a", "b", "b", "b"), sex = c("f", "f", "f", "m", "m", 
      "m", "f", "f", "f", "f", "f", "f"), time = c(1, 2, 3, 1, 2, 3, 
      1, 2, 3, 1, 2, 3), `outcome 1` = c(21, 23, 33, 44, 45, 47, 22, 
      34, 22, 55, 45, 56), `outcome 2` = c(21, 32, 33, 33, 44, 45, 
      22, 57, 98, 65, 42, 42), `outcome 3` = c(62, 84, 63, 51, 45, 
      74, 85, 34, 96, 86, 45, 47)), .Names = c("code", "treatment", 
      "sex", "time", "outcome 1", "outcome 2", "outcome 3"), 
      class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))
www
  • 4,124
  • 1
  • 11
  • 22
DiscoR
  • 247
  • 2
  • 11
  • 2
    Please read your data into R and post the output of dput() instead of an image of the data; also see [SO question tips](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – www Oct 09 '17 at 20:47
  • `?t.test` - has examples at the bottom of the page. First try, then if you still have problem post your attempt. – Gregor Thomas Oct 09 '17 at 20:49
  • @Gregor I tried the example test. The problem is that the way may data is arranged. I thought I arranged the data in tidy format. One of my outcome variables is taken at three different times. I would just like to compare the means of the outcome variable 1 at Time = 3 for treatment a and treatment b. I am not sure how do to this. I could re-arrange my data, however, I thought the current format was a tidy format. Could you take a look at the dput() and see if the format looks tidy? – DiscoR Oct 09 '17 at 21:16
  • It's always good to show that you tried - post the code, post an error message. We got lots of questions from people who don't try, who haven't looked at the documentation. Sometimes you can't even tell if someone has read their data into R yet. If you show your attempt, we know exactly where you are and how to help. – Gregor Thomas Oct 09 '17 at 21:22

2 Answers2

0

First you'll have to define the subsets you want tested, then you can run the t-test. You don't have to necessarily store the subsets in variables as I've done, but it makes the t-test output clearer.

Normally with t-test questions, I'd recommend the help provided by ?t.test, but since this involves more complex subsetting, I've included how to do that here:

var_a <- df$`outcome 1`[df$treatment=="a" & df$time==3]
var_b <- df$`outcome 1`[df$treatment=="b" & df$time==3]

t.test(var_a,var_b)

Output:

    Welch Two Sample t-test

data:  var_a and var_b
t = -3.3773, df = 1.9245, p-value = 0.08182
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -55.754265   7.754265
sample estimates:
mean of x mean of y 
     27.5      51.5 
www
  • 4,124
  • 1
  • 11
  • 22
  • thanks! This is what I wanted to do! I keep getting this error though. I have exported the data set into r as "tidy-data" and the outcome variable is weight in my dataset. t.test(df$`weight`[df$treatment=="a" & df$time==3], df$`weight`[df$treatment=="b" & df$time==3], data = tidy_data) Error in df$weight : object of type 'closure' is not subsettable – DiscoR Oct 09 '17 at 21:32
  • This error is a result of variable naming differences. I stored your sample data as "df", while it looks like you stored yours as "tidy_data". So you can either try rerunning the t-test code after changing all references of "df" to "tidy_data", or vice versa. I've just edited the sample data provided in your question to be stored as "df" for clarity. – www Oct 09 '17 at 21:42
  • thanks a lot for helping with this! I got it working now! Do you think the format I have my data in could be tidier? And what concept/topics should I read up on to know how to do subsetting like the way you described above? – DiscoR Oct 09 '17 at 21:57
  • @DiscoR - I think your dataset looks good as it is. The only thing I would mention is that it's usually easier to work with data in R if column names don't have spaces, but it's not a requirement. For reading concepts/topics, I'd recommend learning about what are called operators in R, knowing how to use brackets [ ], &, and | symbols will be very helpful. There isn't necessarily one resource that I know of with all this information together, but in general, you can find this all here on stackoverflow using the search feature. Good luck and welcome to R. – www Oct 09 '17 at 22:33
0

For reference, The OP's data look like this:

head(df, 3)

# code  treatment  sex  time  outcome 1  outcome 2  outcome 3
# 100   a          f       1         21         21         62
# 100   a          f       2         23         32         84
# 100   a          f       3         33         33         63

To compare outcome 1 means by treatment when time = 3, we can use the subset option in t.test:

t.test(df$`outcome.1` ~ df$treatment, subset = df$time==3)

#   Welch Two Sample t-test
# 
# data:  df$outcome.1 by df$treatment
# t = -3.3773, df = 1.9245, p-value = 0.08182
# alternative hypothesis: true difference in means between group a and group b is not equal to 0
# 95 percent confidence interval:
#  -55.754265   7.754265
# sample estimates:
# mean in group a mean in group b 
#            27.5            51.5 
wes
  • 113
  • 6