1

I'm not sure if I phrased my question properly, so let me give an simplified example:

Given a dataset as follows:

dat <- data_frame(X = c("A", "B", "B", "C", "A"), 
                  Y = c("B", "A", "C", "A", "C"))

how can I compute a pair variable, so that it represents whatever was within X and Y at a given row BUT not generating duplicates, as here:

dat$pair <- c("A-B", "A-B", "B-C", "C-A", "C-A")
dat
# A tibble: 5 × 3
  X     Y     pair 
  <chr> <chr> <chr>
1 A     B     A-B  
2 B     A     A-B  
3 B     C     B-C  
4 C     A     C-A  
5 A     C     C-A  

I can compute a pairing with paste0 but it will indroduce duplicates (C-A is the same as A-C for me) that I want to avoid:

> dat <- mutate(dat, pair = paste0(X, "-", Y))
> dat
# A tibble: 5 × 3
  X     Y     pair 
  <chr> <chr> <chr>
1 A     B     A-B  
2 B     A     B-A  
3 B     C     B-C  
4 C     A     C-A  
5 A     C     A-C  
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
blazej
  • 1,678
  • 3
  • 19
  • 41

3 Answers3

3

We can use pmin and pmax to sort the values parallely and paste them.

transform(dat, pair = paste(pmin(X, Y), pmax(X, Y), sep = '-'))

#  X Y pair
#1 A B  A-B
#2 B A  A-B
#3 B C  B-C
#4 C A  A-C
#5 A C  A-C

If you prefer dplyr this can be written as -

library(dplyr)

dat %>% mutate(pair = paste(pmin(X, Y), pmax(X, Y), sep = '-'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • All solutions presented here are nice, but this is the real deal :) I've been meaning to ask when we apply `pmax` or `pmin` on a data set it is applied on every row, is it correct? – Anoushiravan R Aug 28 '21 at 11:35
2

With dplyr and tidyr you could try:

library(dplyr)
library(tidyr)

dat %>% 
  rowwise() %>% 
  mutate(pair = list(c(X, Y)),
         pair = list(sort(pair)),
         pair = list(paste(pair, collapse = "-"))) %>% 
  select(pair) %>% 
  distinct() %>% 
  unnest(pair)
#> # A tibble: 3 x 1
#>   pair 
#>   <chr>
#> 1 A-B  
#> 2 B-C  
#> 3 A-C

Created on 2021-08-27 by the reprex package (v2.0.0)

data

dat <- data.frame(X = c("A", "B", "B", "C", "A"), 
                  Y = c("B", "A", "C", "A", "C"))
Peter
  • 11,500
  • 5
  • 21
  • 31
  • Thanks @Peter, I chose @Samet response as it returns all the columns and not just the pairing. BTW, there is a comma missing after `pair = list(sort(pair))` in your code :) – blazej Aug 28 '21 at 09:07
  • Thanks for the feedback. Have added the comma, my omission. If you want all pairs just remove the `distinct()` argument. My reading of your question was that you wanted to "avoid duplicate pairs". – Peter Aug 28 '21 at 09:15
2

I reordered each column once

dat <- data.frame(X = c("A", "B", "B", "C", "A"), 
                  Y = c("B", "A", "C", "A", "C"))

library(dplyr)


dat %>%
rowwise %>%
mutate(pair = paste0(sort(c(as.character(X),as.character(Y)),decreasing = F),collapse = '-')) %>%
ungroup

output;

X     Y     pair 
  <fct> <fct> <chr>
1 A     B     A-B  
2 B     A     A-B  
3 B     C     B-C  
4 C     A     A-C  
5 A     C     A-C  
Samet Sökel
  • 2,515
  • 6
  • 21