Tidyverse column-wise differences

Question

Suppose I have a data frame like this:

df = data.frame(preA = c(1,2,3),preB = c(3,4,5),postA = c(6,7,8),postB = c(9,8,4))

I want to add columns having column-wise differences, that is:

diffA = postA - preA
diffB = postB - preB

and so on...
Is there an efficient way to do this in tidyverse?

Maël · Answer 1 · 2022-10-14T13:40:57.020

The way to go with dplyr and tidyr:

library(dplyr)
library(tidyr)
df %>% 
  mutate(id = 1:n()) %>% 
  pivot_longer(-id,
               names_to = c("pre_post", ".value"),
               names_pattern = "(pre|post)(.*)") %>% 
  group_by(id) %>% 
  mutate(across(A:B, diff, .names = "diff{col}")) %>% 
  pivot_wider(names_from = pre_post, values_from = c(A, B),
              names_glue = '{pre_post}{.value}') %>% 
  select(id, starts_with("pre"), starts_with("post"), starts_with("diff"))

#      id  preA  preB postA postB diffA diffB
# 1     1     1     3     6     9     5     6
# 2     2     2     4     7     8     5     4
# 3     3     3     5     8     4     5    -1

A shorter but less flexible was with dplyover::across2:

library(dplyr)
library(dplover)
df %>% 
  #relocate(sort(colnames(.))) %>% 
  mutate(across2(starts_with("post"), starts_with("pre"), `-`,
                 .names = "diff{idx}"))

#      preA  preB postA postB diff1 diff2
# 1       1     3     6     9     5     6
# 2       2     4     7     8     5     4
# 3       3     5     8     4     5    -1

I am not able to find dplyover package - I am getting an error "package ‘dplyover’ is not available for this version of R" — Ravi, Oct 14 '22 at 13:40
`dplyover` is not on CRAN yet, it needs to be installed from github. — Ritchie Sacramento, Oct 14 '22 at 13:59

score 2 · Answer 2 · answered Oct 14 '22 at 13:53

2

You can do this with two uses of across(), creating new variables with the first use and subtracting the second. This also assumes your columns are in order.

df %>%
  mutate(across(starts_with("post"), .names = "diff{sub('post', '', .col)}") - across(starts_with("pre")))

  preA preB postA postB diffA diffB
1    1    3     6     9     5     6
2    2    4     7     8     5     4
3    3    5     8     4     5    -1

answered Oct 14 '22 at 13:53

Ritchie Sacramento

29,890
4
48
56

1

Great one! did not know it was possible +1; definitely the way to go – Maël Oct 14 '22 at 14:00
I like this too... But I think a major issue is that it's not at all easy to check that the two instances of `across()` are returning the correct subsets of `df`, meaning hard-to-detect bugs could easily be introduced. – wurli Oct 14 '22 at 14:23
Thank you much. I like this one the best because I am fairly comfortable with across – Ravi Oct 14 '22 at 14:30

wurli · Answer 3 · 2022-10-14T14:43:16.900

A few more solutions. My favourite is the first one demonstrated here - I think it's the cleanest and most debuggable:

# Setup:
library(dplyr, warn.conflicts = FALSE)
library(glue)

df <- data.frame(
  preA = c(1,2,3), 
  preB = c(3,4,5), 
  postA = c(6,7,8), 
  postB = c(9,8,4)
)

Method 1: Using expressions:

This is my favourite approach. I think it's very readable, and I think it should be reasonably fast compared to solutions using across():

cols         <- c("A", "B")
exprs        <- glue("post{cols} - pre{cols}")
names(exprs) <- glue("diff{cols}")

df |> 
  mutate(!!!rlang::parse_exprs(exprs))
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Method 2: Using `mutate()` + `across()` + `get()`:

Personally, I don't like this sort of thing because I think it's really hard to read:

df |> 
  mutate(across(
    starts_with("post"),
    ~ .x - get(stringr::str_replace_all(cur_column(), "^post", "pre")),
    .names = "diff{stringr::str_remove(.col, '^post')}"
  ))
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Method 3: Using base subsetting:

The main advantage here is that you don't need any packages (you can use paste0() instead of glue()), IMO it's also pretty readable. But I don't like that it doesn't play well with |>:

cols <- c("A", "B")
df2  <- df
df2[glue("diff{cols}")] <- df2[glue("post{cols}")] - df2[glue("pre{cols}")]

df2
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Tidyverse column-wise differences

3 Answers3

Method 1: Using expressions:

Method 2: Using mutate() + across() + get():

Method 3: Using base subsetting:

Method 2: Using `mutate()` + `across()` + `get()`: