How to make a variable elements pair each other within each group in r

Question

I have a dataframe like below:

data.frame(id = rep(1:2, each=4), 
           word = c('apple', 'pear', 'orange', 'banana',
                    'apple', 'watermellon','orange', 'grape') )

I need to convert the data to the form below. Need to pair up all words in the word column within each group (id) and create two columns word1 and word2.

id    word1      word2
1     apple      pear
1     apple      orange
1     apple      banana
1     pear       orange
1     pear       banana
1     orange     banana
2     apple      watermellon
2     apple      Orange

Shouldn't there be more combinations/ – akrun Jul 06 '19 at 21:32 — akrun, Jul 06 '19 at 21:32

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

We can group by 'id', use combn to get the pairwise combinations of 'word', and unnest the output

library(dplyr)
df1 %>%
  group_by(id) %>% 
  summarise(out = list(combn(word, 2, FUN = function(x)
     tibble(word1 = x[1], word2 = x[2]), simplify = FALSE))) %>%
  unnest %>%
  unnest
# A tibble: 12 x 3
#     id word1       word2      
#   <int> <fct>       <fct>      
# 1     1 apple       pear       
# 2     1 apple       orange     
# 3     1 apple       banana     
# 4     1 pear        orange     
# 5     1 pear        banana     
# 6     1 orange      banana     
# 7     2 apple       watermellon
# 8     2 apple       orange     
# 9     2 apple       grape      
#10     2 watermellon orange     
#11     2 watermellon grape      
#12     2 orange      grape

Or with data.table

library(data.table)
setDT(df1)[, as.data.frame(do.call(rbind, combn(as.character(word), 
     2, simplify = FALSE))), by = id]

NOTE: This one use combn which is efficient in only selecting the required combinations and without any joins

data

df1 <- data.frame(id = rep(1:2, each=4), 
           word = c('apple', 'pear', 'orange', 'banana',
                    'apple', 'watermellon','orange', 'grape') )

Some very clever data.table alternatives over here - https://stackoverflow.com/questions/26828301/faster-version-of-combn#comment73724402_26828713 — thelatemail, Jul 07 '19 at 09:12

score 1 · Answer 2 · answered Jul 06 '19 at 23:11

Here's a dplyr solution by joining the dataframe to itself and removing unwanted pairs -

df %>% 
  inner_join(df, by = "id") %>% 
  filter(
    word.x != word.y &
    !duplicated(t(apply(., 1, sort)))
  ) %>% 
  rename(word1 = word.x, word2 = word.y)

   id       word1       word2
1   1       apple        pear
2   1       apple      orange
3   1       apple      banana
4   1        pear      orange
5   1        pear      banana
6   1      orange      banana
7   2       apple watermellon
8   2       apple      orange
9   2       apple       grape
10  2 watermellon      orange
11  2 watermellon       grape
12  2      orange       grape

How to make a variable elements pair each other within each group in r

2 Answers2

data