0

I'm not sure whether my question is too abstract or too theoretical to warrant responses from the folks here, but here goes nothing: Is this a job for mapply?

I'm working with a data set over which (at least) 6 different types of summaries are calculated for the same 21 subsets of my data such that subset type and summary type are ultimately fully crossed:

 - set.1.sum.1, set.2.sum.1...set.21.sum1
 - set.1.sum.2, set.2.sum.2...set.21.sum2
 - ...
 - set.1.sum.6, set.2.sum.6...set.21.sum6 

My function (set.sum.fn), which is applied via lapply to a list of dataframes that contain the subsetting specifications, requires me to explicitly specify the summary type as an argument. The subsetted data have differing numbers of rows but the the same columns. To illustrate with the baseball dataset, using a list of 2 subsetting dfs and 2 summary types:

require(plyr)
require(dplyr)
require(reshape2)
require(lazyeval)

# List of dataframes with various subsetting specifications for main df (baseball)
mylist <- list(
df1  = data_frame( id.filter = c( "aaronha01")) # w/ other conds. in real data
df2  = data_frame( id.filter = c( "zimmech01"))
# ...up to df21 in real data
)


# Function: 
# df =  dataframes (i.e., from mylist)
# sum.type = a quoted sum.type
# other.arg = an additional argument for illustrative purposes here

set.sum.fun <- function( df, 
sum.type = c( "SUM", "SQRT"), 
other.arg = c( "Monday", "Tuesday")) {

if ( sum.type == "SUM")
 {
  df <- baseball %>%
    filter( id == eval( quote( df$id.filter))) %>%
    group_by_( .dots = c( "lg")) %>%
    summarize_( smashes = interp( ~ sum( hr))) %>%
    mutate( new.id = paste( eval( quote( df$id.filter[1])), eval( quote( sum.type)),
    eval( quote( other.arg)), sep = "."))
 }
else if ( sum.type == "SQRT")
 {
  df <- baseball %>%
    filter( id == eval( quote( df$id.filter))) %>%
    group_by( .dots = c( "lg")) %>%
    ddply( .( hr), transform, sqrt.hr = sqrt( hr)) %>%
    summarize_( smashes = interp( ~ sum( sqrt.hr))) %>%
    mutate( new.id = paste( eval( quote( df$id.filter[1])), eval( quote( sum.type)), eval( quote( other.arg)), sep = "."))
  }
}

I'm then creating a new list of dataframes for each sum type from the original list mylist:

sum.mylist<- mylist%>%lapply(set.sum.fun, "SUM", "Monday")
sqrt.mylist<-mylist%>%lapply(set.sum.fun, "SQRT", "Monday")

This is clearly not an optimal way of handling this situation. Among the many problems my solution faces is: 1. the manual specification of the summary type, where passing a list of summary types to the sum.type argument would be ideal; and 2. the manual creation of new lists, where appending the output to a single list or as a large df would be preferable.

I've read some postings about using multiple lists (here and here) with mapply, but I don't seem to be able to apply those answers to the problem I'm trying to solve.

Community
  • 1
  • 1
Steve'sConnect
  • 145
  • 2
  • 8
  • 1
    you want to combine `sum.mylist` and `sqrt.mylist` or just do them simultaneously to get something like `c(sum.mylist, sqrt.mylist)` ? – rawr Jun 15 '16 at 21:37
  • okay well anyway. you could `v <- Vectorize(set.sum.fun, 'sum.type', SIMPLIFY = FALSE)` and you are doing `mylist %>% lapply(v, "SUM", "Monday")` but now you can `mylist %>% lapply(v, c("SUM", 'SQRT'), "Monday")` – rawr Jun 15 '16 at 21:56
  • Ideally, a single list of all the subset x summary dfs would be preferred. Since each df has a unique 'new.id` column, and each df has the same number of columns, I can Rbind them all into a single df if they're in the same list. Thanks for giving me a chance to clarify! – Steve'sConnect Jun 15 '16 at 22:08
  • @rawr-- Amazing...I think Vectorize will change my life! THANKS. – Steve'sConnect Jun 16 '16 at 02:03

0 Answers0