2

I am trying to create a function that uses ddply to summarize data about a particular column that I pass in. I am able to reference the column I want outside of ddply, but I'm not sure how to do it within ddply:

exp_group = c('test','test','control','control')
value = c(1,3,2,3)

df <- data.frame(exp_group, value)

compare_means <- function(df,cols_detail, col_to_eval){
    df_int <- df[, c(cols_detail, col_to_eval)] # this part works fine

    summary <- ddply(df_int
                  , .(exp_group)
                  , summarize
                  , mean = t.test(col_to_eval)$estimate #these ones don't
                  , lo_bound = t.test(col_to_eval)$conf.int[1]
                  , hi_bound = t.test(col_to_eval)$conf.int[2]
                  )

  return(summary)
}

test <- compare_means(df, 'exp_group','value')

When I do this, it returns col_to_eval not found. I've also tried it with df_int[,col_to_eval], as well as df_int[,2] (col reference value) and it says df_int not found.

Where I want to find the means of the test and control groups.

How do I reference the column I want in the t.test functions?

emilylinndb
  • 99
  • 1
  • 7
  • Could you provide a small reproducible example of your data? It would be helpful. – msoftrain Apr 15 '15 at 19:45
  • 1
    you have a typo in `conf.int[2]` http://stackoverflow.com/questions/18516548/use-ddply-within-a-function-and-include-variable-of-interest-as-an-argument – rawr Apr 15 '15 at 20:26
  • Thank you - fixed that, please see edits above of the issue at this point. – emilylinndb Apr 15 '15 at 20:57
  • ddply (basically everything hadley does) doesn't do standard evaluation, so when you use ddply the normal way, `col_to_eval` is assumed to be a name of a column in the data set, df_int. it doesnt find that, so you get the error. the dplyr package actually has separate functions so you can do the standard or nonstandard evaluation which is probably more straight-forward than your workaround – rawr Apr 16 '15 at 01:10

1 Answers1

1

Ok, went through a few iterations and finally got it to work by doing this:

exp_group = c('test','test','control','control')
value = c(1,3,2,3)

df <- data.frame(exp_group, value)

compare_means <- function(df,cols_detail, col_to_eval){
  df_int <- df[, c(cols_detail, col_to_eval)]

  summary <- ddply(df_int
                   , .(exp_group)
                   , function(x){
                     mean = t.test(x[,col_to_eval])$estimate
                     lo_bound = t.test(x[,col_to_eval])$conf.int[1]
                     hi_bound = t.test(x[,col_to_eval])$conf.int[2]
                     data.frame(mean, lo_bound, hi_bound)
                   }
  )

  return(summary) 
}

test <- compare_means(df, 'exp_group','value')
emilylinndb
  • 99
  • 1
  • 7