1

I have an R data frame as in the following example. I wish to calculate the differences in the column values between observations/ rows (all combinations).

my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))

> my_df
# A tibble: 5 x 3
       a     b      c
   <dbl> <dbl>  <dbl>
1 0.0513 0.267 0.846 
2 0.614  0.683 0.937 
3 0.230  0.700 0.0651
4 0.671  0.110 0.901 
5 0.424  0.520 0.817 

I have tried the code below which gives me only the difference between subsequent rows; I want to have all combinations: row2 - row1; row3 - row1; row4 - row1, row5- row1, row3 - row2, row4 - row2, and so on...

Also, the code I wrote does not seem the best to me (!), although it outputs the result I wish, but not for all possible combinations!

my_diff <- as.data.frame(diff(as.matrix(my_df)))
> my_diff
           a           b           c
1  0.5623574  0.41522579  0.09165630
2 -0.3837289  0.01755953 -0.87209740
3  0.4407068 -0.58982681  0.83540813
4 -0.2463205  0.40943495 -0.08358985

I appreciate if someone could provide help in solving my question using R, if possible a using tidy verse options.

Thanks.

FCL
  • 11
  • 2
  • Does this answer your question? [R - How can I generate difference of all combinations of columns in a data frame](https://stackoverflow.com/questions/41819797/r-how-can-i-generate-difference-of-all-combinations-of-columns-in-a-data-frame) – Skaqqs Oct 23 '21 at 18:58
  • @Skaqqs, thank you, but not really. I am trying to calculate the difference between rows and not columns. – FCL Oct 23 '21 at 20:42
  • Ok, sorry about that. I must not understand your desired output. Are you able to share what you are expecting based on your example data? Thanks! – Skaqqs Oct 23 '21 at 20:54
  • Crossposted: https://community.rstudio.com/t/r-calculate-the-differences-in-the-column-values-between-rows-observations-all-combinations/118783 – yarnabrina Oct 24 '21 at 04:25

2 Answers2

2

UPDATE: A tidy friendly solution:

library(tidyverse)
set.seed(1)
my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))

gives:

# A tibble: 5 x 3
      a      b     c
  <dbl>  <dbl> <dbl>
1 0.266 0.898  0.206
2 0.372 0.945  0.177
3 0.573 0.661  0.687
4 0.908 0.629  0.384
5 0.202 0.0618 0.770

And from there:

my_df %>%
  mutate(ID = row_number()) %>%
  slice(as.numeric(t(combn(1:nrow(.), 2)))) %>%
  mutate(group = rep(1:(n()/2), 2)) %>%
  group_by(group) %>%
  summarize(comparison = paste0(ID[2], "-", ID[1]),
            across(c(a, b, c), ~ .[2] - .[1])) %>%
  select(-group)

which gives:

# A tibble: 10 x 4
   comparison       a       b       c
   <chr>        <dbl>   <dbl>   <dbl>
 1 2-1         0.107   0.0463 -0.0294
 2 3-1         0.307  -0.238   0.481 
 3 4-1         0.643  -0.269   0.178 
 4 5-1        -0.0638 -0.837   0.564 
 5 3-2         0.201  -0.284   0.510 
 6 4-2         0.536  -0.316   0.208 
 7 5-2        -0.170  -0.883   0.593 
 8 4-3         0.335  -0.0317 -0.303 
 9 5-3        -0.371  -0.599   0.0828
10 5-4        -0.707  -0.567   0.386  
deschen
  • 10,012
  • 3
  • 27
  • 50
  • thank you for sharing your approach :) I appreciate it. – FCL Oct 23 '21 at 22:17
  • @FCL see my update for a completely tidy way. – deschen Oct 23 '21 at 22:35
  • looks great, thanks. Suppose I have multiple columns, I then would use `across(-comparison, ~.[2] - .[1]))`. I tried it in my df and it worked well. Appreciate your help – FCL Oct 23 '21 at 23:20
  • Almost. Should be `across(-c(ID), …)`. Also, if the answer is what you are looking for, please upvote it and flag it as your accepted answer. – deschen Oct 24 '21 at 01:38
  • Nice solution, but is `as.numeric` required? This is enough I think: `slice(t(combn(nrow(.), 2)))`. – yarnabrina Oct 24 '21 at 04:45
  • thanks @deschen unfortunately I still have not earned enough reputation points to upvote the answers! – FCL Oct 24 '21 at 11:04
  • @deschen if I use across(-c(ID),...) I getthe following message: Error: Problem with `summarise()` input `..2`. i `..2 = across(-c(ID), ~.[2] - .[1])`. x non-numeric argument to binary operator i The error occurred in group 1: group = 1. – FCL Oct 24 '21 at 11:49
  • @deschen but not when I do across(-comparison,...) – FCL Oct 24 '21 at 11:50
  • Sorry, should be `across(-c(ID, comparison), …)`. – deschen Oct 24 '21 at 15:19
0

Kindly let me know if this is what you were anticipating.

my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))

# Generating the sequence to calculate the combinations
seq1 <- seq(1,nrow(my_df)) 
seq2 <- seq1

# Generating the Combinations
Combinations <- expand.grid(seq1, seq2)
# Removing the dupilicate Combinations
Combinations <- Combinations[which(Combinations$Var2 < Combinations$Var1),]

# Performing the subtraction
result <- my_df[Combinations$Var1,] - my_df[Combinations$Var2,]

Update based on the comment:

result <- expand.grid(seq1,seq1)%>%
  filter(Var1 > Var2)%>%
  mutate(my_df[Var1,] - my_df[Var2,])