4

Suppose I have a data frame like this:

df = data.frame(preA = c(1,2,3),preB = c(3,4,5),postA = c(6,7,8),postB = c(9,8,4))

I want to add columns having column-wise differences, that is:

diffA = postA - preA
diffB = postB - preB 

and so on...
Is there an efficient way to do this in tidyverse?

user438383
  • 5,716
  • 8
  • 28
  • 43
Ravi
  • 41
  • 3

3 Answers3

2

The way to go with dplyr and tidyr:

library(dplyr)
library(tidyr)
df %>% 
  mutate(id = 1:n()) %>% 
  pivot_longer(-id,
               names_to = c("pre_post", ".value"),
               names_pattern = "(pre|post)(.*)") %>% 
  group_by(id) %>% 
  mutate(across(A:B, diff, .names = "diff{col}")) %>% 
  pivot_wider(names_from = pre_post, values_from = c(A, B),
              names_glue = '{pre_post}{.value}') %>% 
  select(id, starts_with("pre"), starts_with("post"), starts_with("diff"))

#      id  preA  preB postA postB diffA diffB
# 1     1     1     3     6     9     5     6
# 2     2     2     4     7     8     5     4
# 3     3     3     5     8     4     5    -1

A shorter but less flexible was with dplyover::across2:

library(dplyr)
library(dplover)
df %>% 
  #relocate(sort(colnames(.))) %>% 
  mutate(across2(starts_with("post"), starts_with("pre"), `-`,
                 .names = "diff{idx}"))

#      preA  preB postA postB diff1 diff2
# 1       1     3     6     9     5     6
# 2       2     4     7     8     5     4
# 3       3     5     8     4     5    -1
Maël
  • 45,206
  • 3
  • 29
  • 67
2

You can do this with two uses of across(), creating new variables with the first use and subtracting the second. This also assumes your columns are in order.

df %>%
  mutate(across(starts_with("post"), .names = "diff{sub('post', '', .col)}") - across(starts_with("pre")))

  preA preB postA postB diffA diffB
1    1    3     6     9     5     6
2    2    4     7     8     5     4
3    3    5     8     4     5    -1
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • 1
    Great one! did not know it was possible +1; definitely the way to go – Maël Oct 14 '22 at 14:00
  • I like this too... But I think a major issue is that it's not at all easy to check that the two instances of `across()` are returning the correct subsets of `df`, meaning hard-to-detect bugs could easily be introduced. – wurli Oct 14 '22 at 14:23
  • Thank you much. I like this one the best because I am fairly comfortable with across – Ravi Oct 14 '22 at 14:30
2

A few more solutions. My favourite is the first one demonstrated here - I think it's the cleanest and most debuggable:

# Setup:
library(dplyr, warn.conflicts = FALSE)
library(glue)

df <- data.frame(
  preA = c(1,2,3), 
  preB = c(3,4,5), 
  postA = c(6,7,8), 
  postB = c(9,8,4)
)

Method 1: Using expressions:

This is my favourite approach. I think it's very readable, and I think it should be reasonably fast compared to solutions using across():

cols         <- c("A", "B")
exprs        <- glue("post{cols} - pre{cols}")
names(exprs) <- glue("diff{cols}")

df |> 
  mutate(!!!rlang::parse_exprs(exprs))
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Method 2: Using mutate() + across() + get():

Personally, I don't like this sort of thing because I think it's really hard to read:

df |> 
  mutate(across(
    starts_with("post"),
    ~ .x - get(stringr::str_replace_all(cur_column(), "^post", "pre")),
    .names = "diff{stringr::str_remove(.col, '^post')}"
  ))
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Method 3: Using base subsetting:

The main advantage here is that you don't need any packages (you can use paste0() instead of glue()), IMO it's also pretty readable. But I don't like that it doesn't play well with |>:

cols <- c("A", "B")
df2  <- df
df2[glue("diff{cols}")] <- df2[glue("post{cols}")] - df2[glue("pre{cols}")]

df2
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1
wurli
  • 2,314
  • 10
  • 17