5

I'm self-taught in R and this is my first StackOverflow question. I apologize if this is an obvious issue; please be kind.

Short Version of my Question
I wrote a custom function to calculate the percent change in a variable year over year. I would like to use purrr's map_at function to apply my custom function to a vector of variable names. My custom function works when applied to a single variable, but fails when I chain it using map_a

My custom function

calculate_delta <- function(df, col) {

  #generate variable name
  newcolname = paste("d", col, sep="")

  #get formula for first difference.
  calculate_diff <- lazyeval::interp(~(a + lag(a))/a, a = as.name(col))

  #pass formula to mutate, name new variable the columname generated above
  df %>% 
        mutate_(.dots = setNames(list(calculate_diff), newcolname)) }

When I apply this function to a single variable in the mtcars dataset, the output is as expected (although obviously the meaning of the result is non-sensical).

calculate_delta(mtcars, "wt")

Attempt to Apply the Function to a Character Vector Using Purrr

I think that I'm having trouble conceptualizing how map_at passes arguments to the function. All of the example snippets I can find online use map_at with functions like is.character, which don't require additional arguments. Here are my attempts at applying the function using purrr.

vars <- c("wt", "mpg")
mtcars %>% map_at(vars, calculate_delta)

This gives me this error message

Error in paste("d", col, sep = "") : argument "col" is missing, with no default

I assume this is because map_at is passing vars as the df, and not passing an argument for col. To get around that issue, I tried the following:

vars <- c("wt", "mpg") 
mtcars %>% map_at(vars, calculate_delta, df = .)

That throws me this error:

Error: unrecognised index type

I've monkeyed around with a bunch of different versions, including removing the df argument from the calculate_delta function, but I have had no luck.

Other potential solutions

1) A version of this using sapply, rather than purrr. I've tried solving the problem that way and had similar trouble. And my goal is to figure out a way to do this using purrr, if that is possible. Based on my understanding of purrr, this seems like a typical use case.

2) I can obviously think of how I would implement this using a for loop, but I'm trying to avoid that if possible for similar reasons.

Clearly I'm thinking about this wrong. Please help!

EDIT 1

To clarify, I am curious if there is a method of repeatedly transforming variables that accomplishes two things.

1) Generates new variables within the original tbl_df without replacing replace the columns being mutated (as is the case when using dplyr's mutate_at).

2) Automatically generates new variable labels.

3) If possible, accomplishes what I've described by applying a single function using map_at.

It may be that this is not possible, but I feel like there should be an elegant way to accomplish what I am describing.

Sean Williams
  • 55
  • 1
  • 6
  • 2
    Your function isn't ready to be placed in `mutate` or similar structure. Try `mtcars %>% mutate(calculate_delta(wt))` to see that even without `purrr` or `map` it doesn't work. If it doesn't work with a normal `dplyr` call, it won't work in that structure. It should be re-written. You can start by removing the necessity of data frame specification. Think about how `sum` or `mean` don't require a data frame as part of the call, they are built for vectors. – Pierre L Aug 30 '16 at 04:15
  • Thank you, this is a helpful way to think about this issue. This function, from @PierreLafortune below, works as part of a dplyr mutate call: `delta <- function(x) (x + dplyr::lag(x)) /x` and it also works with `purrr`. As I mentioned below, the part that is tripping me up is dynamically renaming the variables. – Sean Williams Aug 31 '16 at 02:45

1 Answers1

10

Try simplifying the process:

delta <- function(x) (x + dplyr::lag(x)) /x
cols <- c("wt", "mpg")

#This
library(dplyr)
mtcars %>% mutate_at(cols, delta)
#Or
library(purrr)
mtcars %>% map_at(cols, delta)

#If necessary, in a function
f <- function(df, cols) {
  df %>% mutate_at(cols, delta)
}

f(iris, c("Sepal.Width", "Petal.Length"))
f(mtcars, c("wt", "mpg"))

Edit

If you would like to embed new names after, we can write a custom pipe-ready function:

Rename <- function(object, old, new) {
  names(object)[names(object) %in% old] <- new
  object
}

mtcars %>% 
  mutate_at(cols, delta) %>% 
  Rename(cols, paste0("lagged",cols))

If you want to rename the resulting lagged variables:

mtcars %>% mutate_at(cols, funs(lagged = delta))
Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
Pierre L
  • 28,203
  • 6
  • 47
  • 69
  • Thank you for your response. These solutions mostly produce the result I'm looking for, but they do so by replacing the original variables with the laged variable. [This post](http://stackoverflow.com/questions/38340180/automatically-generate-new-variable-names-using-dplyr-mutate) shows one way to dynamically rename the variable within `mutate_each`, but I can't pass a character vector as an argument to `vars`. – Sean Williams Aug 31 '16 at 02:44
  • You don't have to dynamically rename. Just rename it after. Or if you need it in the pipe write a custom function. – Pierre L Aug 31 '16 at 02:59
  • Thanks again, Pierre. The method you describe has the disadvantage of replacing the variables being mutated with the lagged variables. As I describe in "Edit 1" of my original post, my goal is to apply the function without replacing the original variables, and by dynamically generate names in a single step. – Sean Williams Aug 31 '16 at 14:51
  • 2
    @SeanWilliams `mutate_at` doesn't have to replace columns if you give a suffix name to add: `mtcars %>% mutate_at(cols, funs(lagged = delta))` – aosmith Aug 31 '16 at 15:57
  • @aosmith fantastic, this is exactly what I was looking for. Thank you! – Sean Williams Aug 31 '16 at 21:18