0

I have a data frrame looking like this

My goal is to find the most common pair of items, in this case (1 and 3)

I already tried this:


names(tail(sort(table(unlist(tapply(ol$ORDER_ID, ol$SKU_ID,
                FUN = function(x) if(length(x) > 1) combn(unique(x), 2, paste, collapse="-") else NA)))),
           3))

But I keep getting this error message, and I don't know how to fix it.

Error in combn(unique(x), 2, paste, collapse = "-") : n < m

Someone suggested

library(dplyr), then count(your_data, ORDER_ID, SKU_ID) %>% arrange(desc(n))

But it still gives me the same error message, another person refered me to this post, but I struggle to see the relevance.

  • `library(dplyr)`, then `count(your_data, ORDER_ID, SKU_ID) %>% arrange(desc(n))` – Gregor Thomas Nov 09 '20 at 14:15
  • I followed your advice, then put my code in, but the Error message stays the same `Error in combn(unique(x), 2, paste, collapse = "-") : n < m` – Arvan Theil Nov 09 '20 at 14:19
  • Yes, my code is an alternate method, not a fix for your method. – Gregor Thomas Nov 09 '20 at 14:41
  • Ah, but I see I misunderstood your question. It would be really helpful if you should share some reproducible sample data. `dput()` is the easiest way to do this, something like `dput(your_data[1:10, ])` for the first 10 rows. It's difficult to develop and test code based on a picture of data... – Gregor Thomas Nov 09 '20 at 14:45
  • @GregorThomas I have transcribed the data in the picture into my answer. – Allan Cameron Nov 09 '20 at 14:47
  • I think [this is a possible duplicate, and d.b's answer looks applicable](https://stackoverflow.com/a/45491650/903061). – Gregor Thomas Nov 09 '20 at 14:47
  • I tried that solution already, but it keeps giving me : `Error in combn(unique(x), min(2, length(x)), paste, collapse = "-") : n < m` – Arvan Theil Nov 09 '20 at 16:00

1 Answers1

0

In base R you could do:

tab <- table(ol$SKU_ID, ol$ORDER_ID)
as.numeric(combn(row.names(tab), 2)[,
                which.max(rowSums(apply(combn(row.names(tab), 2), 1, 
                         function(x) rowSums(tab[x,]))))])
#> [1] 1 3

Data used

ol <- data.frame(
  ORDER_ID = c(111, 111, 121, 121, 121, 121, 131, 131, 141, 141, 141),
  SKU_ID   = c(1, 2, 1, 3, 4, 5, 1, 3, 1, 3, 9))
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87