0

I have a dataframe with two columns that repeat themselves in reverse order (i.e. the pairing of the two columns will always be the same)

Example:

col1 <- c('a', 'c', 'g', 'd', 'e', 'b', 'f', 'h')
col2 <- c('b', 'd', 'h', 'c', 'f', 'a', 'e', 'g')

df <- data.frame(col1, col2, stringsAsFactors = FALSE)

I want to add a column as an identifier of these combinations regardless of the order (e.g. row 1 and row 6 are equivalent). The final solution would look like:

  col1 col2 ID
1    a    b  1
2    c    d  2
3    g    h  3
4    d    c  2
5    e    f  4
6    b    a  1
7    f    e  4
8    h    g  3
CoolGuyHasChillDay
  • 659
  • 1
  • 6
  • 21

3 Answers3

1
df$grp <- interaction(do.call(pmin, df[1:2]), do.call(pmax, df[1:2]))

df
#   col1 col2 grp
# 1    a    b a.b
# 2    c    d c.d
# 3    g    h g.h
# 4    d    c c.d
# 5    e    f e.f
# 6    b    a a.b
# 7    f    e e.f
# 8    h    g g.h

If you want numbers, you can then do

df$grp <- as.integer(df$grp)

df
#   col1 col2 grp
# 1    a    b   1
# 2    c    d   6
# 3    g    h  16
# 4    d    c   6
# 5    e    f  11
# 6    b    a   1
# 7    f    e  11
# 8    h    g  16
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
1

data.table oneliner

You create the by using apply
apply (df, 1, function(x) paste0( sort( x ), collapse='' ) )
which results in
[1] "ab" "cd" "gh" "cd" "ef" "ab" "ef" "gh",
the sorted combination of col1 and col2.

Based in this vector, data.table can create group-numbers for each unique element, which you pass to the new ID-variable using .GRP.

library(data.table)

setDT(df)[, ID := .GRP, by = apply (df, 1, function(x) paste0( sort( x ), collapse='' ) ) ][]

#    col1 col2 ID
# 1:    a    b  1
# 2:    c    d  2
# 3:    g    h  3
# 4:    d    c  2
# 5:    e    f  4
# 6:    b    a  1
# 7:    f    e  4
# 8:    h    g  3
Wimpel
  • 26,031
  • 1
  • 20
  • 37
0

A solution using dplyr and purrr:

ordered_paste <- function(x, y) {
  paste0(c(x, y)[order(c(x, y))], collapse = "")
}

df %>%
  mutate(ID = map2(col1, col2, ~ ordered_paste(.x, .y)))
#   col1 col2 ID
# 1    a    b ab
# 2    c    d cd
# 3    g    h gh
# 4    d    c cd
# 5    e    f ef
# 6    b    a ab
# 7    f    e ef
# 8    h    g gh
Bart Spoon
  • 93
  • 1
  • 7