Here's a faster solution using base R code. Strategy is convert to a matrix, subtract column one from the required columns, build back into a data frame. Note this only returns the modified columns - if there are columns not in vars_to_process
they'll not appear in the output but you didn't have any of those in your test set so I'll assume they don't exist.
So, always write things in functions whenever possible:
bsr = function(df,vars_to_process){
m = as.matrix(df)
data.frame(
A = m[, 1],
m[, 1] - m[, vars_to_process])}
Make some test data:
> df = data.frame(matrix(runif(5*1000), ncol=5))
> names(df)=LETTERS[1:5]
> dft = as.tibble(df)
> head(dft)
# A tibble: 6 x 5
A B C D E
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.2609174 0.07857624 0.2727817 0.8498004 0.3403234
2 0.3644744 0.95810657 0.8183856 0.2958133 0.4752349
3 0.6042914 0.98793218 0.7547003 0.9596591 0.5354045
4 0.4000441 0.61403331 0.9018804 0.3838347 0.3266855
5 0.6767012 0.11984219 0.9181570 0.5988404 0.6058629
Compare with the tidyverse version:
akr = function(df,vars_to_process){
df %>% mutate_at(vars_to_process, funs(r_diff(.,df[[1]])))
}
Check bsr
and akr
agree:
> head(bsr(dft, vars_to_process))
A B C D E
1 0.2609174 0.1823412 -0.01186432 -0.58888295 -0.07940594
2 0.3644744 -0.5936322 -0.45391119 0.06866108 -0.11076050
3 0.6042914 -0.3836408 -0.15040892 -0.35536765 0.06888696
4 0.4000441 -0.2139892 -0.50183635 0.01620939 0.07335861
> head(akr(dft, vars_to_process))
# A tibble: 6 x 5
A B C D E
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.2609174 0.1823412 -0.01186432 -0.58888295 -0.07940594
2 0.3644744 -0.5936322 -0.45391119 0.06866108 -0.11076050
3 0.6042914 -0.3836408 -0.15040892 -0.35536765 0.06888696
4 0.4000441 -0.2139892 -0.50183635 0.01620939 0.07335861
okay, except akr
returns a tribble but nm. Benchmark:
> microbenchmark(bsr(dft, vars_to_process),akr(dft, vars_to_process))
Unit: microseconds
expr min lq mean median uq
bsr(dft, vars_to_process) 362.117 388.7215 488.9309 446.123 521.776
akr(dft, vars_to_process) 8070.391 8365.4230 9853.5239 8673.692 9335.613
Base R version is 26 times faster. I'd also argue that subtracting a column from another set of columns is tidier than applying a mutator function but as long as you wrap what your doing in a function it doesn't matter how messy the guts are.