44

I have a dataset with the following structure:

Classes ‘tbl_df’ and 'data.frame':  10 obs. of  7 variables:
 $ GdeName  : chr  "Aeugst am Albis" "Aeugst am Albis" "Aeugst am Albis" "Aeugst am Albis" ...
 $ Partei   : chr  "BDP" "CSP" "CVP" "EDU" ...
 $ Stand1971: num  NA NA 4.91 NA 3.21 ...
 $ Stand1975: num  NA NA 5.389 0.438 4.536 ...
 $ Stand1979: num  NA NA 6.2774 0.0195 3.4355 ...
 $ Stand1983: num  NA NA 4.66 1.41 3.76 ...
 $ Stand1987: num  NA NA 3.48 1.65 5.75 ...

I want to provide a function which allows to compute the difference between any value, and I would like to do this using dplyrs mutate function like so: (assume the parameters from and to are passed as arguments)

from <- "Stand1971"
to <- "Stand1987"

data %>%
  mutate(diff = from - to)

Of course, this doesn't work, as dplyr uses non-standard evaluation. And I know there's now an elegant solution to the problem using mutate_, and I've read this vignette, but I still can't get my head around it.

What to do?

Here's the first few rows of the dataset for a reproducible example

structure(list(GdeName = c("Aeugst am Albis", "Aeugst am Albis", 
"Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", 
"Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis"
), Partei = c("BDP", "CSP", "CVP", "EDU", "EVP", "FDP", "FGA", 
"FPS", "GLP", "GPS"), Stand1971 = c(NA, NA, 4.907306434, NA, 
3.2109535926, 18.272143463, NA, NA, NA, NA), Stand1975 = c(NA, 
NA, 5.389079711, 0.4382328556, 4.5363022622, 18.749259742, NA, 
NA, NA, NA), Stand1979 = c(NA, NA, 6.2773722628, 0.0194647202, 
3.4355231144, 25.294403893, NA, NA, NA, 2.7055961071), Stand1983 = c(NA, 
NA, 4.6609804428, 1.412940467, 3.7563539244, 26.277246489, 0.8529335746, 
NA, NA, 2.601878177), Stand1987 = c(NA, NA, 3.4767860929, 1.6535933856, 
5.7451770193, 22.146844746, NA, 3.7453183521, NA, 13.702211858
)), .Names = c("GdeName", "Partei", "Stand1971", "Stand1975", 
"Stand1979", "Stand1983", "Stand1987"), class = c("tbl_df", "data.frame"
), row.names = c(NA, -10L))
grssnbchr
  • 2,877
  • 7
  • 37
  • 71
  • 2
    It doesn't answer your question, but guessing from the context, you might be better over with a tidy data set that you could just use `lead(x) - x` to compute the differences between subsequent values for all years at once. – hadley Apr 16 '15 at 15:59

2 Answers2

67

Using the latest version of dplyr (>=0.7), you can use the rlang !! (bang-bang) operator.

library(tidyverse)
from <- "Stand1971"
to <- "Stand1987"

data %>%
  mutate(diff=(!!as.name(from))-(!!as.name(to)))

You just need to convert the strings to names with as.name and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !! operator seems to fall in a weird order-of-operations order.

Original answer, dplyr (0.3-<0.7):

From that vignette (vignette("nse","dplyr")), use lazyeval's interp() function

library(lazyeval)

from <- "Stand1971"
to <- "Stand1987"

data %>%
  mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Why is this approach more "sexy" (or preferred) than using `paste`? – grssnbchr Apr 16 '15 at 15:18
  • 1
    interp() helps to capture the appropriate environments as well which is more important when you have more complicated scoping or non-base functions – MrFlick Apr 16 '15 at 15:21
  • 8
    @wnstnsmth as well as capturing environments, the interp appraoch will always work regardless of the name of the variables. Using paste is just putting a ticking time bug bomb into your code. – hadley Apr 16 '15 at 15:58
  • What if I want my new column name (diff in this example) to be dynamic as well? The same construction doesn't seem to work on the LHS of the mutate assignment. – DanTan Feb 05 '19 at 22:05
  • 9
    @DanTan Use `mutate(!!diff :=(!!as.name(from))-(!!as.name(to)))`. The `:=` allows you to change the name of the new column on the left of the equals. See https://stackoverflow.com/q/26003574/2372064 – MrFlick Feb 05 '19 at 22:07
18

You can use .data inside dplyr chain now.

library(dplyr)
from <- "Stand1971"
to <- "Stand1987"

data %>% mutate(diff = .data[[from]] - .data[[to]])

Another option is to use sym with bang-bang (!!)

data %>% mutate(diff = !!sym(from) - !!sym(to))

In base R, we can use :

data$diff <- data[[from]] - data[[to]]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This answer is perfect, however in other parts of the code I use the glue syntax ```{var}``` to do this. It however doesn't work within this context. Is there a .data equivalent in the glue syntax that dplyr can use now? – Keipi Aug 03 '21 at 09:33
  • 3
    Have you tried with `{.data[var]}` ? You can maybe ask a new question regarding your specific case. – Ronak Shah Aug 03 '21 at 10:18