6

How would I go about using mutate (my presumption is that I am looking for standard evaluation in my case, and hence mutate_, but I am not entirely confident on this point) when using a function that accepts a list of variable names, such as this:

createSum = function(data, variableNames) {
  data %>% 
    mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                            var = as.name(paste(as.character(variableNames), collapse =","))))

}

Here is an MWE that strips the function to its core logic and demonstrates what I am trying to achieve:

library(dplyr)
library(lazyeval)

# function to make random table with given column names
makeTable = function(colNames, sampleSize) {
  liSample = lapply(colNames, function(week) {
    sample = rnorm(sampleSize)
  })
  names(liSample) = as.character(colNames)
  return(tbl_df(data.frame(liSample, check.names = FALSE)))
}

# create some sample data with the column name patterns required
weekDates = seq.Date(from = as.Date("2014-01-01"),
                     to = as.Date("2014-08-01"), by = "week")
dfTest = makeTable(weekDates, 10)

# test mutate on this table
dfTest %>% 
  mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                          var = as.name(paste(as.character(weekDates), collapse =","))))

Expected output here is what would be returned by:

rowSums(dfTest[, as.character(weekDates)])
tchakravarty
  • 10,736
  • 12
  • 72
  • 116
  • You define `makeTable ` but then call `makeDataFrame`. Are these supposed to be the same function? It would be helpful to describe the output you expect for this sample input (set a seed to the data is reproducible). – MrFlick May 07 '15 at 17:46
  • @MrFlick Thanks. Changed the function name. Nothing fancy is expected, just the `sum` of all the variables whose variable names are passes to the function, by row. Will update with expected output. – tchakravarty May 07 '15 at 17:47

2 Answers2

5

I think this is what you're after

createSum = function(data, variableNames) {
    data %>% 
        mutate_(sumvar = paste(as.character(variableNames), collapse ="+"))
}
createSum(dfTest, weekDates)

where we just supply a character value rather than interp because you can't pass in a list of names as a single parameter to a function. Plus, sum() would do some undesired collapsing because operations are not performed rowwise, they are passed in columns of vectors at a time.

The other problem with this example is that you set check.names=FALSE in your data.frame which means that you've created column names that cannot be valid symbols. You can explicitly wrap your variable names in back-ticks if you like

createSum(dfTest , paste0("`", weekDates,"`"))

but in general it would be better not to use invalid names.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks, this would work, but what if the function did not have a handy operator symbol? Secondly, how then would I pass a list of arguments by name to `...` in a function? The only examples of standard evaluation I have seen involve one variable name. – tchakravarty May 07 '15 at 18:04
  • It's not easy to talk about the hypothetical, every function might be different. But this method of string building should work for for many other functions (`sum` is somewhat of an exception). Just the paste might look like `paste0("funname(", paste(vars, collapse=","), ")")` – MrFlick May 07 '15 at 18:08
  • Yeah, as I feared, that looks like terrible syntax (through no fault of yours!). I don't think I am understanding the paradigm very well though -- all I want to be able to do is evaluate variable symbols in the environment of the `data_frame`. Surely there is a neater way to do this without resorting to cumbersome expression building or `eval(parse(text =)`. – tchakravarty May 07 '15 at 18:12
  • I've looked into this before: http://stackoverflow.com/questions/28751023/performing-dplyr-mutate-on-subset-of-columns. I'm not sure exactly what syntax you were envisioning, but it's just not pretty building dynamic expressions most of the time. I agree that evaluating parsed strings isn't a good idea and should be avoided if possible. – MrFlick May 07 '15 at 18:58
1

I don't know if this is an "officially sanctioned" dplyr way, but this is a possibility:

weekDates = as.character(weekDates) # more convenient

dfTest %>% mutate(sumvar = Reduce(`+`, lapply(weekDates, get, .)))
#or
dfTest %>% mutate(sumvar = rowSums(as.data.frame(lapply(weekDates, get, .))))

This does carry potentially significant performance penalties, depending on your particular usage - in addition to dplyr's regular copying of the entire data I think it also copies it a second time during that internal computation. You can look into data.table to avoid the extra copying around by adding columns in place (and using .SDcols to avoid the second copy) + you'll get arguably better syntax.

eddi
  • 49,088
  • 6
  • 104
  • 155