I'm not sure whether my question is too abstract or too theoretical to warrant responses from the folks here, but here goes nothing: Is this a job for mapply?
I'm working with a data set over which (at least) 6 different types of summaries are calculated for the same 21 subsets of my data such that subset type and summary type are ultimately fully crossed:
- set.1.sum.1, set.2.sum.1...set.21.sum1
- set.1.sum.2, set.2.sum.2...set.21.sum2
- ...
- set.1.sum.6, set.2.sum.6...set.21.sum6
My function (set.sum.fn
), which is applied via lapply
to a list of dataframes that contain the subsetting specifications, requires me to explicitly specify the summary type as an argument. The subsetted data have differing numbers of rows but the the same columns. To illustrate with the baseball
dataset, using a list of 2 subsetting dfs and 2 summary types:
require(plyr)
require(dplyr)
require(reshape2)
require(lazyeval)
# List of dataframes with various subsetting specifications for main df (baseball)
mylist <- list(
df1 = data_frame( id.filter = c( "aaronha01")) # w/ other conds. in real data
df2 = data_frame( id.filter = c( "zimmech01"))
# ...up to df21 in real data
)
# Function:
# df = dataframes (i.e., from mylist)
# sum.type = a quoted sum.type
# other.arg = an additional argument for illustrative purposes here
set.sum.fun <- function( df,
sum.type = c( "SUM", "SQRT"),
other.arg = c( "Monday", "Tuesday")) {
if ( sum.type == "SUM")
{
df <- baseball %>%
filter( id == eval( quote( df$id.filter))) %>%
group_by_( .dots = c( "lg")) %>%
summarize_( smashes = interp( ~ sum( hr))) %>%
mutate( new.id = paste( eval( quote( df$id.filter[1])), eval( quote( sum.type)),
eval( quote( other.arg)), sep = "."))
}
else if ( sum.type == "SQRT")
{
df <- baseball %>%
filter( id == eval( quote( df$id.filter))) %>%
group_by( .dots = c( "lg")) %>%
ddply( .( hr), transform, sqrt.hr = sqrt( hr)) %>%
summarize_( smashes = interp( ~ sum( sqrt.hr))) %>%
mutate( new.id = paste( eval( quote( df$id.filter[1])), eval( quote( sum.type)), eval( quote( other.arg)), sep = "."))
}
}
I'm then creating a new list of dataframes for each sum type from the original list mylist
:
sum.mylist<- mylist%>%lapply(set.sum.fun, "SUM", "Monday")
sqrt.mylist<-mylist%>%lapply(set.sum.fun, "SQRT", "Monday")
This is clearly not an optimal way of handling this situation. Among the many problems my solution faces is: 1. the manual specification of the summary type, where passing a list of summary types to the sum.type
argument would be ideal; and 2. the manual creation of new lists, where appending the output to a single list or as a large df would be preferable.
I've read some postings about using multiple lists (here and here) with mapply
, but I don't seem to be able to apply those answers to the problem I'm trying to solve.