1

I have a dataframe like below:

data.frame(id = rep(1:2, each=4), 
           word = c('apple', 'pear', 'orange', 'banana',
                    'apple', 'watermellon','orange', 'grape') )

I need to convert the data to the form below. Need to pair up all words in the word column within each group (id) and create two columns word1 and word2.

id    word1      word2
1     apple      pear
1     apple      orange
1     apple      banana
1     pear       orange
1     pear       banana
1     orange     banana
2     apple      watermellon
2     apple      Orange
zesla
  • 11,155
  • 16
  • 82
  • 147

2 Answers2

1

We can group by 'id', use combn to get the pairwise combinations of 'word', and unnest the output

library(dplyr)
df1 %>%
  group_by(id) %>% 
  summarise(out = list(combn(word, 2, FUN = function(x)
     tibble(word1 = x[1], word2 = x[2]), simplify = FALSE))) %>%
  unnest %>%
  unnest
# A tibble: 12 x 3
#     id word1       word2      
#   <int> <fct>       <fct>      
# 1     1 apple       pear       
# 2     1 apple       orange     
# 3     1 apple       banana     
# 4     1 pear        orange     
# 5     1 pear        banana     
# 6     1 orange      banana     
# 7     2 apple       watermellon
# 8     2 apple       orange     
# 9     2 apple       grape      
#10     2 watermellon orange     
#11     2 watermellon grape      
#12     2 orange      grape     

Or with data.table

library(data.table)
setDT(df1)[, as.data.frame(do.call(rbind, combn(as.character(word), 
     2, simplify = FALSE))), by = id] 

NOTE: This one use combn which is efficient in only selecting the required combinations and without any joins

data

df1 <- data.frame(id = rep(1:2, each=4), 
           word = c('apple', 'pear', 'orange', 'banana',
                    'apple', 'watermellon','orange', 'grape') )
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Some very clever data.table alternatives over here - https://stackoverflow.com/questions/26828301/faster-version-of-combn#comment73724402_26828713 – thelatemail Jul 07 '19 at 09:12
1

Here's a dplyr solution by joining the dataframe to itself and removing unwanted pairs -

df %>% 
  inner_join(df, by = "id") %>% 
  filter(
    word.x != word.y &
    !duplicated(t(apply(., 1, sort)))
  ) %>% 
  rename(word1 = word.x, word2 = word.y)

   id       word1       word2
1   1       apple        pear
2   1       apple      orange
3   1       apple      banana
4   1        pear      orange
5   1        pear      banana
6   1      orange      banana
7   2       apple watermellon
8   2       apple      orange
9   2       apple       grape
10  2 watermellon      orange
11  2 watermellon       grape
12  2      orange       grape
Shree
  • 10,835
  • 1
  • 14
  • 36