0

I have 9 products. A basket can contain any combination of products such that each product appears 0-9 times in the basket. For example, basket1 might contain 0 of product1, 3 of product2, 1 of product3, etc.

I want to generate a unique BasketId for each basket. Rather than doing the obvious Id which, for the above example would be a long string like "product1_0 product2_3 ..." I want to use a numeric Id for storage efficiency.

Since there are 9 products and can each occur 10 times, there should be 10^9 possible combinations. So I should be able to create a mapping like

get_basketId <- function(p1, p2, p3, p4, p5, p6, p7, p8, p9){
  # given the count of each product, return a unique numeric id

}

which is also invertible

get_product_counts <- function(get_basketId){
  # given a basketId, return the count of each product

}

Any suggestions how to do this?

Note also that Cantor Pairing is one theoretical solution, but unfortunately Cantor Pairing results in unnecessarily large IDs that overflow.

Ben
  • 20,038
  • 30
  • 112
  • 189
  • 2
    Use a 9-digit base-10 number. – Matthew Lundberg Dec 27 '16 at 19:53
  • @MatthewLundberg great solution. I was hoping for something a little more generalizable but given my constraints your solution would be great. – Ben Dec 27 '16 at 19:54
  • Without the constraints, there is no general solution. – Matthew Lundberg Dec 27 '16 at 19:55
  • @MatthewLundberg I disagree. NxNxN...xN is countable, so there should be a nice mapping as long as none of the p_i's are infinite. – Ben Dec 27 '16 at 19:58
  • 1
    You can use alphabetical letters to go up to base 26 or to base 52 if you include capitals. The `letters` and `LETTERS` functions provide easy mapping one way. Then allow position to denote products. To save some space, you could convert these to factors. You could use answers from [this question](http://stackoverflow.com/questions/37239715/convert-letters-to-numbers/37239786) to perform a reverse mapping. – lmo Dec 27 '16 at 19:58
  • @Ben Given any particular maximum value, yes it is easy. But try to do it without knowing any of the maximum values in advance. – Matthew Lundberg Dec 27 '16 at 20:01
  • @Mathhew Lundberg This is still possible. Loosely speaking, you could generate a mapping that maps a number to every combination of {(0), (0), ... (0)}, followed by all combinations of {(0, 1), (0, 1), ... (0, 1)}, followed by all combinations of {(0, 1, 2), (0, 1, 2), ... (0, 1, 2)} etc. ...but I don't know an elegant way to code this. – Ben Dec 27 '16 at 20:05
  • this might be a simple question: just to clarify : why not `paste0()` the numbers in `get_basketId()` and vice versa in the other function? – joel.wilson Dec 27 '16 at 20:08
  • @joel.wilson Joins and such work a little faster when Ids are stored as ints rather than long strings. Also, printing data.frames/data.tables can be annoying when you have a column with long strings. – Ben Dec 27 '16 at 20:11
  • it won't be that long right; just a 9 character string always. anyway, it was just a random question. – joel.wilson Dec 27 '16 at 20:14

2 Answers2

2

I like hexmode for this. It is a shorter and neater id.

get_basketId <- function(...){
  args <- list(...)
  num <- as.integer(paste0(c(1, unlist(args)),collapse=""))
  as.hexmode(num)
}

get_product_counts <- function(basketId){
  num <- as.integer(as.hexmode(basketId))
  res <- as.integer(strsplit(as.character(num), "")[[1]])[-1]
  setNames(res, paste0("p", 1:length(res)))
}

get_basketId(1, 2, 3, 4)
#[1] "2be2"

get_product_counts("2be2")
#p1 p2 p3 p4 
# 1  2  3  4 
Pierre L
  • 28,203
  • 6
  • 47
  • 69
1

Where you have a collection of M different types of items, each of which can be specified 0..N times, the classic solution is to encode in a M-digit, base (N+1) number.

It's easier to demonstrate if we reduce the number of types, but the code is identical.

get_basketId <- function(p1, p2) {
    return (10*p2 + p1)
}

get_product_counts <- function(id) {
    return (list(id %/% 10, id %% 10))
}
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112