2

I have a very large dataset and want to write an economical code for data analysis.

Here is an example for illustration

df <- data.frame(
ID = factor(sample(c("A","B","C","D","E","F","G"), 20, replace=TRUE)),
a1 = runif(20),
a2 = runif(20),
a3 = runif(20),
a4 = runif(20),
b1 = runif(20),
b2 = runif(20),
b3 = runif(20),
b4 = runif(20))

I would like to make a paired samples t test like this (example):

t.test(df$a1, df$b1, paired=TRUE, na.rm=TRUE)
t.test(df$a2, df$b2, paired=TRUE, na.rm=TRUE)

That works but I want a shorter code and tried that:

object_a <- paste("a", 1:4, sep="")
object_b <- paste("b", 1:4, sep="")

t.test.func.paired <- function(x) {
 t.test(x, y, paired = TRUE, na.rm=TRUE)
}
df %>%
select_(.dots = c(object_a, object_b)) %>%
sapply(., t.test.func.paired) %>%
.[c("statistic", "parameter", "p.value"), ] %>%
View()

Unfortunately that does not work. But where is the error? Thank you!

Joelle
  • 21
  • 1
  • Something like this? http://stackoverflow.com/questions/37474672/looping-through-t-tests-for-data-frame-subsets-in-r/37479506#37479506 – coffeinjunky Feb 16 '17 at 14:09
  • 1
    Instead of `df$a1`, you could use `df[, "a1"]`. Then your pasting thing will work. as an alternative, you could store A and B in separate lists and then refer to the list elements by position. – lmo Feb 16 '17 at 14:09

1 Answers1

0

Here is a process that uses dplyr and broom packages. Broom will help you save t.test results in a data frame automatically so you won't have to extract various info yourself.

The key is to create all combinations of variables you want and for each combination to run the appropriate test. Note that this involves the column names in order (like a1, a2, ... , b1, b2, ...). Dplyr will help you avoid for loops for each variable combination.

library(dplyr)
library(broom)

# dataset
df <- data.frame(
  ID = factor(sample(c("A","B","C","D","E","F","G"), 20, replace=TRUE)),
  a1 = runif(20),
  a2 = runif(20),
  a3 = runif(20),
  a4 = runif(20),
  b1 = runif(20),
  b2 = runif(20),
  b3 = runif(20),
  b4 = runif(20))

# split dataset names based on matching 
object_a = names(df)[grep("a", names(df))]
object_b = names(df)[grep("b", names(df))]


cbind(object_a, object_b) %>%                  # combine dataset names
  data.frame(., stringsAsFactors = F) %>%      # create a dataset
  rowwise() %>%                                # for each row
  do(data.frame(.,                             # keep dataset names
                tidy(t.test(df[,.$object_a],   # get t.test results as a data frame based on the object names you have in that row
                            df[,.$object_b], 
                            paired = T, 
                            na.rm = T)))) %>%  
  ungroup                                      # forget the grouping

# # A tibble: 4 × 10
#   object_a object_b    estimate  statistic   p.value parameter   conf.low  conf.high        method alternative
# *    <chr>    <chr>       <dbl>      <dbl>     <dbl>     <dbl>      <dbl>      <dbl>        <fctr>      <fctr>
# 1       a1       b1 -0.03689665 -0.5253532 0.6054150        19 -0.1838941 0.11010078 Paired t-test   two.sided
# 2       a2       b2 -0.09111585 -1.2358669 0.2315703        19 -0.2454267 0.06319499 Paired t-test   two.sided
# 3       a3       b3  0.07515723  0.7721983 0.4494961        19 -0.1285545 0.27886900 Paired t-test   two.sided
# 4       a4       b4  0.04359102  0.4317255 0.6708003        19 -0.1677402 0.25492223 Paired t-test   two.sided
AntoniosK
  • 15,991
  • 2
  • 19
  • 32