1

I'm trying to do some parametrised dplyr manipulations. The simplest reproducible example to express the root of the problem is this:

# Data
test <- data.frame(group = rep(1:5, each = 2),
                   value = as.integer(c(NA, NA, 2, 3, 3, 5, 7, 8, 9, 0)))

> test
    group value
1      1    NA
2      1    NA
3      2     2
4      2     3
5      3     3
6      3     5
7      4     7
8      4     8
9      5     9
10     5     0 

# Summarisation example, this is what I'd like to parametrise
# so that I can pass in functions and grouping variables dynamically

test.summary <- test %>% 
                group_by(group) %>% 
                summarise(group.mean = mean(value, na.rm = TRUE))

> test.summary
Source: local data frame [5 x 2]

    group group.mean
    <int>      <dbl>
1     1        NaN
2     2        2.5
3     3        4.0  # Correct results
4     4        7.5
5     5        4.5

This is how far I got alone

# This works fine, but notice there's no 'na.rm = TRUE' passed in

doSummary <- function(d_in = data, func = 'mean', by = 'group') {
# d_in: data in
# func: required function for summarising
# by:   the variable to group by 
# NOTE: the summary is always for the 'value' column in any given dataframe

    # Operations for summarise_
    ops <- interp(~f(value), 
                  .values = list(f = as.name(func),
                                 value = as.name('value')))        
    d_out <- d_in %>% 
             group_by_(by) %>% 
             summarise_(.dots = setNames(ops, func))
}

> doSummary(test)
Source: local data frame [5 x 2]

  group mean(value)
  <int>       <dbl>
1     1          NA
2     2         2.5
3     3         4.0
4     4         7.5
5     5         4.5

Trying with the 'na.rm' parameter

# When I try passing in the 'na.rm = T' parameter it breaks
doSummary.na <- function(d_in = data, func = 'mean', by = 'group') {
    # Doesn't work
    ops <- interp(~do.call(f, args), 
                  .values = list(f = func,
                                 args = list(as.name('value'), na.rm = TRUE)))

    d_out <- d_in %>% 
             group_by_(by) %>% 
             summarise_(.dots = setNames(ops, func))
}

> doSummary.na(test)
Error: object 'value' not found 

Many thanks for your help!

pfabri
  • 885
  • 1
  • 9
  • 25
  • And `interp` comes from ...? – nicola Jun 14 '16 at 15:21
  • 2
    @pfabri The key bit of information missing is that `interp()` is from package lazyeval, there are other functions with the same name, for example in akima – Miff Jun 14 '16 at 15:34
  • @pfabri I can't tell if the following might work in your case, although it doesn't directly answer your question `interp(~do.call(f,args), .values = list(f = 'mean',args=list(na.rm=TRUE)))`. – Miff Jun 14 '16 at 15:56
  • @pfabri `interp(~do.call(f,args), .values = list(f = 'mean',args=list(steps, na.rm=TRUE)))` – Miff Jun 15 '16 at 08:00
  • I've implemented all the suggested changes and rewrote the question to be fully reproducible, yet concise (I hope). – pfabri Jun 15 '16 at 19:21

1 Answers1

3

Your title mentions ... but your question doesn’t. If we don’t need to deal with ..., the answer gets a lot easier, because we don’t need do.call at all, we can call the function directly; simply replace your ops definition with:

ops = interp(~f(value, na.rm = TRUE),
             f = match.fun(func), value = as.name('value'))

Note that I’ve used match.fun here instead of as.name. This is generally a better idea since it works “just like R” for function lookup. As a consequence, you can’t just pass a function name character as an argument but also a function name or an anonymous function:

doSummary.na(test, function (x, ...) mean(x, ...) / sd(x, ...)) # x̂/s?! Whatever.

Speaking of which, your attempt to set the column names also fails; you need to put ops into a list to fix that:

d_in %>%
    group_by_(by) %>%
    summarise_(.dots = setNames(list(ops), func))

… because .dots expects a list of operations (and setNames also expects a vector/list). However, this code once again won’t work if you’re passing a func object in to the function that isn’t a character vector. To make this more robust, use something like this:

fname = if (is.character(func)) {
        func
    } else if (is.name(substitute(func))) {
        as.character(substitute(func))
    } else {
        'func'
    }

d_in %>%
    group_by_(by) %>%
    summarise_(.dots = setNames(list(ops), fname))

Things get more complicated if you actually want to allow passing ..., instead of known arguments, because (as far as I know) there’s simply no direct way of passing ... via interp, and, like you, I cannot get the do.call approach to work.

The ‹lazyeval› package provides the very nice function make_call, which helps us on the way to a solution. The above could also be written as

# Not good. :-(
ops = make_call(as.name(func), list(as.name('value'), na.rm = TRUE))

This works. BUT only when func is passed as a character vector. As explained above, this simply isn’t flexible.

However, make_call simply wraps base R’s as.call and we can use that directly:

ops = as.call(list(match.fun(func), as.name('value'), na.rm = TRUE))

And now we can simply pass ... on:

doSummary = function (d_in = data, func = 'mean', by = 'group', ...) {
    ops = as.call(list(match.fun(func), as.name('value'), ...))

    fname = if (is.character(func)) {
            func
        } else if (is.name(substitute(func))) {
            as.character(substitute(func))
        } else {
            'func'
        }

    d_in %>%
        group_by_(by) %>%
        summarize_(.dots = setNames(list(ops), fname))
}

To be clear: the same could be achieved using interp but I think this would require manually building a formula object from a list, which amounts to doing very much the same as in my solution, and then (redundantly) calling interp on the result.

I generally find that while ‹lazyeval› is incredibly elegant, in some situations base R provides simpler solutions. In particular, interp is a powerful substitute replacement but bquote, a quite underused base R function, already provides many of the same syntactic benefits. The great benefit of ‹lazyeval› objects is that they carry around their evaluation environment, unlike base R expressions. However, this is simply not always needed.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214