2

I've got the data with users and products they used over a certain time period:

dframe <- data.frame(id = c(1234,1234, rep(3456, 4)), 
                     product = c("Apple", "Pear", "Apple", "Pear", "Grapes", "Kiwi"))

  id product
1234   Apple
1234    Pear
3456   Apple
3456    Pear
3456  Grapes
3456    Kiwi

I'm looking for a way of creating unique combinations of product pairs, per user (where pair x-y would equal y-x pair). The solution would look like this:

solution
  id product1 product2
1234    Apple     Pear
3456    Apple     Pear
3456    Apple   Grapes
3456    Apple     Kiwi
3456     Pear   Grapes
3456     Pear     Kiwi
3456   Grapes     Kiwi

Essentially, I'd like to apply an equivalent of combn(product,2) after dplyr's group_by(id), if that makes sense. Any ideas how to approach this?

Thanks a lot for your help!

Kasia Kulma
  • 1,683
  • 1
  • 14
  • 39

3 Answers3

3

Here is an option using CJ from data.table

library(data.table)
setDT(dframe)[, product := as.character(product)
     ][, CJ(product1= product, product2 = product, unique = TRUE), 
  by = id][product1 != product2
  ][!duplicated(data.table(id, pmin(product1, product2), pmax(product1, product2)))]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Hi @akrun, thanks for that, looks very promising, but the output I'm getting from it is duplicated pairs for 1234 id and only 4 options for 3456 id... – Kasia Kulma May 23 '17 at 13:46
  • @KasiaKulma I get 7 rows based on it. – akrun May 23 '17 at 13:47
  • I get 6 with the following warning message: `Warning messages: 1: In Ops.factor(mmm, each) : ‘>’ not meaningful for factors 2: In Ops.factor(mmm, each) : ‘<’ not meaningful for factors` Could that explain the discrepancies? – Kasia Kulma May 23 '17 at 13:54
  • @KasiaKulma replace `product1=product` with `product1=as.character(product)` and the same with product2 in `CJ` and you will get the desired result. – lmo May 23 '17 at 13:55
  • @KasiaKulma Sorry, I forgot to mention that I changed the class to `character` – akrun May 23 '17 at 14:16
2

You can find a few functions in this post regarding unique combinations. If we borrow the function defined at that post by @Ferdinand.kraft

expand.grid.unique <- function(x, y, include.equals=FALSE)
{
    x <- unique(x)

    y <- unique(y)

    g <- function(i)
    {
        z <- setdiff(y, x[seq_len(i-include.equals)])

        if(length(z)) cbind(x[i], z, deparse.level=0)
    }

    do.call(rbind, lapply(seq_along(x), g))
}

Then we can use it via dplyr as follows,

library(dplyr)

 dframe %>% 
   group_by(id) %>% 
   do(as.data.frame(expand.grid.unique(as.character(.$product), as.character(.$product))))

#Source: local data frame [7 x 3]
#Groups: id [2]

#     id     V1     V2
#  <dbl>  <chr>  <chr>
#1  1245  Apple   Pear
#2  3456  Apple   Pear
#3  3456  Apple Grapes
#4  3456  Apple   Kiwi
#5  3456   Pear Grapes
#6  3456   Pear   Kiwi
#7  3456 Grapes   Kiwi
Sotos
  • 51,121
  • 6
  • 32
  • 66
2

Here is an option with group_by %>% do with combn:

dframe %>% 
    group_by(id) %>% do({
    setNames(
        data.frame(t(combn(.$product, 2)), stringsAsFactors=F), 
    c("product1", "product2"))
})

#Source: local data frame [7 x 3]
#Groups: id [2]

#     id product1 product2
#  <dbl>    <chr>    <chr>
#1  1234    Apple     Pear
#2  3456    Apple     Pear
#3  3456    Apple   Grapes
#4  3456    Apple     Kiwi
#5  3456     Pear   Grapes
#6  3456     Pear     Kiwi
#7  3456   Grapes     Kiwi
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • I'm getting the following error when running your code: `Error in names(object) <- nm : 'names' attribute [2] must be the same length as the vector [1]`, any ideas where the problem is? thanks! – Kasia Kulma May 23 '17 at 13:52
  • Did you use `t()` in front of `combn` to transpose the result? – Psidom May 23 '17 at 13:54
  • I copy-pasted your code correctly, if that's what you're asking :) – Kasia Kulma May 23 '17 at 13:56
  • @KasiaKulma Make sure your product is as.character – Sotos May 23 '17 at 13:58
  • Yeah. I think @Sotos is right. I forgot to mention that I used `stringsAsFactors = F` when constructing the original data frame. – Psidom May 23 '17 at 13:59
  • thanks, indeed, product was as.factor in my test set. Thanks a lot for that! Even though all solutions were relevant, yours I found most readable, thank you! – Kasia Kulma May 23 '17 at 14:00