1

I am looking for a nice tidy/dplyr approach to compute the difference between all possible pair of columns (including repeats e.g A-B & B-A) in a dataframe.

I start with df and would like to end with end_df:

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1

df <- tibble(A = rnorm(1:10),
       B = rnorm(1:10),
       C = rnorm(1:10))
print(df)
#> # A tibble: 10 × 3
#>          A       B       C
#>      <dbl>   <dbl>   <dbl>
#>  1 -0.292   1.27    0.783 
#>  2 -1.11    0.254  -0.410 
#>  3  2.05    1.67    1.35  
#>  4  1.31    0.0329 -1.29  
#>  5 -1.67   -0.379  -0.696 
#>  6 -1.02   -0.686   1.43  
#>  7 -0.291  -0.0728  0.336 
#>  8 -0.507   0.350   1.70  
#>  9 -0.707   0.961  -0.493 
#> 10  0.0459 -0.299  -0.0113


end_df <- df %>% 
  mutate( "A-B" = A-B,
          "A-C" = A-C,
          "B-A" = B-A,
          "B-C" = B-C,
          "C-A" = C-A,
          "C-B" = C-B)

print(end_df)
#> # A tibble: 10 × 9
#>          A       B       C  `A-B`   `A-C`  `B-A`  `B-C`   `C-A`  `C-B`
#>      <dbl>   <dbl>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl>   <dbl>  <dbl>
#>  1 -0.292   1.27    0.783  -1.56  -1.08    1.56   0.482  1.08   -0.482
#>  2 -1.11    0.254  -0.410  -1.37  -0.703   1.37   0.664  0.703  -0.664
#>  3  2.05    1.67    1.35    0.380  0.702  -0.380  0.321 -0.702  -0.321
#>  4  1.31    0.0329 -1.29    1.28   2.60   -1.28   1.33  -2.60   -1.33 
#>  5 -1.67   -0.379  -0.696  -1.29  -0.975   1.29   0.317  0.975  -0.317
#>  6 -1.02   -0.686   1.43   -0.334 -2.44    0.334 -2.11   2.44    2.11 
#>  7 -0.291  -0.0728  0.336  -0.218 -0.627   0.218 -0.409  0.627   0.409
#>  8 -0.507   0.350   1.70   -0.857 -2.20    0.857 -1.35   2.20    1.35 
#>  9 -0.707   0.961  -0.493  -1.67  -0.215   1.67   1.45   0.215  -1.45 
#> 10  0.0459 -0.299  -0.0113  0.345  0.0572 -0.345 -0.288 -0.0572  0.288

Created on 2022-09-05 by the reprex package (v2.0.1)

dcsuka
  • 2,922
  • 3
  • 6
  • 27
CyG
  • 382
  • 1
  • 12

1 Answers1

1

You can get a list of all of the pairs of names, and then create a list of columns of the original dataframe mutated, the bind them:

pairs <- expand.grid(names(df), names(df)) %>%
  filter(Var1 != Var2)

map2(pairs$Var1, pairs$Var2, function(x, y) as_tibble_col(df[[x]] - df[[y]], str_c(x, "-", y))) %>%
  bind_cols(df, .)

# # A tibble: 10 × 9
#          A       B       C   `B-A`  `C-A`   `A-B`   `C-B`  `A-C`   `B-C`
#      <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
#  1  0.199   0.110   0.0148 -0.0895 -0.184  0.0895 -0.0948  0.184  0.0948
#  2 -0.851  -0.413   0.338   0.438   1.19  -0.438   0.751  -1.19  -0.751 
#  3 -1.13    0.112  -1.97    1.24   -0.835 -1.24   -2.08    0.835  2.08  
#  4  0.597  -2.89   -2.32   -3.49   -2.92   3.49    0.572   2.92  -0.572 
#  5 -1.10    0.0953  0.996   1.19    2.09  -1.19    0.900  -2.09  -0.900 
#  6  0.0191  0.500   1.17    0.481   1.15  -0.481   0.667  -1.15  -0.667 
#  7  0.416   0.949  -0.865   0.533  -1.28  -0.533  -1.81    1.28   1.81  
#  8  1.84   -1.66   -1.39   -3.50   -3.23   3.50    0.267   3.23  -0.267 
#  9  0.406  -1.48   -1.33   -1.89   -1.74   1.89    0.149   1.74  -0.149 
# 10  0.393  -0.491  -0.139  -0.884  -0.532  0.884   0.352   0.532 -0.352 
dcsuka
  • 2,922
  • 3
  • 6
  • 27
  • Thanks, that's useful, but I would prefer a `dplyr` approach if doable – CyG Sep 05 '22 at 22:49
  • Basically this entire solution is in `tidyverse` except `expand.grid`, which can be replaced by `purrr::cross_df`, and the selections, which can all be replaced by `dplyr::select`. Is there something fundamentally untidy about this? To the best of my knowledge, `dplyr` doesn't have any permutation capabilities, so you will have to deviate somewhat as this is largely a permutation problem. – dcsuka Sep 06 '22 at 00:36
  • I heard from a previous post that there was a c_across option that could potentially be done within a `dplyr::mutate()`? – CyG Sep 06 '22 at 10:53
  • Maybe that could work, but I'm not sure how to do. You can experiment with `combn` too, even though its not dplyr https://stackoverflow.com/questions/16919998/subtract-every-column-from-each-other-column-in-a-r-data-table – dcsuka Sep 06 '22 at 18:08