4

I have two dataframes and I would like to do independent 2-group t-tests on the rows (i.e. t.test(y1, y2) where y1 is a row in dataframe1 and y2 is matching row in dataframe2)

whats best way of accomplishing this?

EDIT: I just found the format: dataframe1[i,] dataframe2[i,]. This will work in a loop. Is that the best solution?

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
bdeonovic
  • 4,130
  • 7
  • 40
  • 70

2 Answers2

5

The approach you outlined is reasonable, just make sure to preallocate your storage vector. I'd double check that you really want to compare the rows instead of the columns. Most datasets I work with have each row as a unit of observation and the columns represent separate responses/columns of interest Regardless, it's your data - so if that's what you need to do, here's an approach:

#Fake data
df1 <- data.frame(matrix(runif(100),10))
df2 <- data.frame(matrix(runif(100),10))


#Preallocate results
testresults <- vector("list", nrow(df1))
#For loop
for (j in seq(nrow(df1))){
  testresults[[j]] <- t.test(df1[j,], df2[j,])
}

You now have a list that is as long as you have rows in df1. I would then recommend using lapply and sapply to easily extract things out of the list object.

Chase
  • 67,710
  • 18
  • 144
  • 161
  • Thank you for the words of wisdom. I indeed do have to do ttests across the rows, perhaps poor file design on my part. I'll keep it in mind for next time! – bdeonovic Sep 21 '12 at 00:58
  • @Chase I wonder whether you can do a paired t-test using 'paired = T' with your solution. At the moment it gives me an error... – Geek On Acid Jul 24 '13 at 08:19
3

It would make more sense to have your data stored as columns.

You can transpose a data.frame by

df1_t <- as.data.frame(t(df1))
df2_t <- as.data.frame(t(df2))

Then you can use mapply to cycle through the two data.frames a column at a time

t.test_results <- mapply(t.test, x= df1_t, y = df2_t, SIMPLIFY = F)

Or you could use Map which is a simple wrapper for mapply with SIMPLIFY = F (Thus saving key strokes!)

t.test_results <- Map(t.test, x = df1_t, y = df2_t)
mnel
  • 113,303
  • 27
  • 265
  • 254