10

I've seen a few solutions to similar problems, but they all require iteration over the number of items to be added together.

Here's my goal: from a list of numbers, find all of the combinations (without replacement) that add up to a certain total. For example, if I have numbers 1,1,2,3,5 and total 5, it should return 5,2,3, and 1,1,3.

I was trying to use combn but it required you to specify the number of items in each combination. Is there a way to do it that allows for solution sets of any size?

Joseph Wood
  • 7,077
  • 2
  • 30
  • 65
Kira Tebbe
  • 576
  • 8
  • 23
  • Possible duplicate of [Finding all possible combinations of numbers to reach a given sum](https://stackoverflow.com/questions/4632322/finding-all-possible-combinations-of-numbers-to-reach-a-given-sum) – 989 Nov 10 '18 at 00:07
  • 3
    @989, perhaps, but the only R solution there *returns nothing* (which is, in this context and my opinion, a waste of a function) and operates in side-effect by printing compactly formatted "formulas" to the console. It might be useful to some, but it defies functional programming and actually doing something with the findings. – r2evans Nov 10 '18 at 02:52

5 Answers5

10

This is precisely what combo/permuteGeneral from RcppAlgos (I am the author) were built for. Since we have repetition of specific elements in our sample vector, we will be finding combinations of multisets that meet our criteria. Note that this is different than the more common case of generating combinations with repetition where each element is allowed to be repeated m times. For many combination generating functions, multisets pose problems as duplicates are introduced and must be dealt with. This can become a bottleneck in your code if the size of your data is decently large. The functions in RcppAlgos handle these cases efficiently without creating any duplicate results. I should mention that there are a couple of other great libraries that handle multisets quite well: multicool and arrangements.

Moving on to the task at hand, we can utilize the constraint arguments of comboGeneral to find all combinations of our vector that meet a specific criteria:

vec <- c(1,1,2,3,5)  ## using variables from @r2evans
uni <- unique(vec)
myRep <- rle(vec)$lengths
ans <- 5

library(RcppAlgos)
lapply(seq_along(uni), function(x) {
    comboGeneral(uni, x, freqs = myRep,
                 constraintFun = "sum",
                 comparisonFun = "==",
                 limitConstraints = ans)
})

[[1]]
[,1]
[1,]    5

[[2]]
[,1] [,2]
[1,]    2    3

[[3]]
[,1] [,2] [,3]
[1,]    1    1    3

[[4]]
[,1] [,2] [,3] [,4]  ## no solutions of length 4

These functions are highly optimized and extend well to larger cases. For example, consider the following example that would produce over 30 million combinations:

## N.B. Using R 4.0.0 with new updated RNG introduced in 3.6.0
set.seed(42)
bigVec <- sort(sample(1:30, 40, TRUE))

rle(bigVec)
Run Length Encoding
  lengths: int [1:22] 2 1 2 3 4 1 1 1 2 1 ...
  values : int [1:22] 1 2 3 4 5 7 8 9 10 11 ...

bigUni <- unique(bigVec)
bigRep <- rle(bigVec)$lengths
bigAns <- 199
len <- 12

comboCount(bigUni, len, freqs = bigRep)
[1] 32248100

All 300000+ results are returned very quickly:

system.time(bigTest <- comboGeneral(bigUni, len, freqs = bigRep,
                                    constraintFun = "sum",
                                    comparisonFun = "==",
                                    limitConstraints = bigAns))
 user  system elapsed 
0.273   0.004   0.271

head(bigTest)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,]    1    1    2    3    4   25   26   26   26    27    28    30
[2,]    1    1    2    3    5   24   26   26   26    27    28    30
[3,]    1    1    2    3    5   25   25   26   26    27    28    30
[4,]    1    1    2    3    7   24   24   26   26    27    28    30
[5,]    1    1    2    3    7   24   25   25   26    27    28    30
[6,]    1    1    2    3    7   24   25   26   26    26    28    30

nrow(bigTest)
[1] 280018

all(rowSums(bigTest) == bigAns)
[1] TRUE

Addendum

I must mention that generally when I see a problem like: "finding all combinations that sum to a particular number" my first thought is integer partitions. For example, in the related problem Getting all combinations which sum up to 100 in R, we can easily solve with the partitions library. However, this approach does not extend to the general case (as we have here) where the vector contains specific repetition or we have a vector that contains values that don't easily convert to an integer equivalent (E.g. the vector (0.1, 0.2, 0.3, 0.4) can easily be treated as 1:4, however treating c(3.98486 7.84692 0.0038937 7.4879) as integers and subsequently applying an integer partitions approach would require an extravagant amount of computing power rendering this method useless).

Joseph Wood
  • 7,077
  • 2
  • 30
  • 65
6

I took your combn idea and looped over the possible sizes of the sets.

func = function(x, total){
    M = length(x)
    y = NULL
    total = 15
    for (m in 1:M){
        tmp = combn(x, m)
        ind = which(colSums(tmp) == total)
        if (length(ind) > 0){
            for (j in 1:length(ind))
                y = c(y, list(tmp[,ind[j]]))
            }
        }
    return (unique(lapply(y, sort)))
    }

x = c(1,1,2,3,5,8,13)

> func(x, 15)
[[1]]
[1]  2 13

[[2]]
[1]  1  1 13

[[3]]
[1] 2 5 8

[[4]]
[1] 1 1 5 8

[[5]]
[1] 1 1 2 3 8

Obviously, this will have problems as M grows since tmp will get big pretty quickly and the length of y can't be (maybe?) pre-determined.

mickey
  • 2,168
  • 2
  • 11
  • 20
  • I like this method; the only issue is it'll return duplicates in a different order (like both `[1,4]` and `[4,1]`). – Kira Tebbe Nov 10 '18 at 18:42
  • 1
    Modified so no duplicates – mickey Nov 10 '18 at 18:47
  • Great edit! Missing right parenthesis at the end of the `return` statement, but a great solution. Thank you! – Kira Tebbe Nov 10 '18 at 18:50
  • 1
    Thanks, I would still recommend Joseph Wood's answer especially if you're working with large data. – mickey Nov 10 '18 at 18:58
  • Thanks for the answer! BTW, how to find out the combination of numbers that add to more than a number? Taken account this example, how do we find the number of combination of the element in the vector that add up to above 15? – Tong Claire Xu Jan 27 '21 at 15:33
  • @TongClaireXu Haven't tested this, but you should be able to just change `ind = which(colSums(tmp) == total)` to `ind = which(colSums(tmp) > total)`. – mickey Jan 27 '21 at 18:05
5

Similar to mickey's answer, we can use combn inside another looping mechanism. I'll use lapply:

vec <- c(1,1,2,3,5)
ans <- 5

Filter(length, lapply(seq_len(length(vec)),
       function(i) {
         v <- combn(vec, i)
         v[, colSums(v) == ans, drop = FALSE]
       }))
# [[1]]
#      [,1]
# [1,]    5
# [[2]]
#      [,1]
# [1,]    2
# [2,]    3
# [[3]]
#      [,1]
# [1,]    1
# [2,]    1
# [3,]    3

You can omit the Filter(length, portion, though it may return a number of empty matrices. They're easy enough to deal with and ignore, I just thought removing them would be aesthetically preferred.

This method gives you a matrix with multiple candidates in each column, so

ans <- 4
Filter(length, lapply(seq_len(length(vec)),
       function(i) {
         v <- combn(vec, i)
         v[, colSums(v) == ans, drop = FALSE]
       }))
# [[1]]
#      [,1] [,2]
# [1,]    1    1
# [2,]    3    3
# [[2]]
#      [,1]
# [1,]    1
# [2,]    1
# [3,]    2

If duplicates are a problem, you can always do:

Filter(length, lapply(seq_len(length(vec)),
       function(i) {
         v <- combn(vec, i)
         v <- v[, colSums(v) == ans, drop = FALSE]
         v[,!duplicated(t(v)),drop = FALSE]
       }))
# [[1]]
#      [,1]
# [1,]    1
# [2,]    3
# [[2]]
#      [,1]
# [1,]    1
# [2,]    1
# [3,]    2
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Can duplicates be a problem? I think `combn` gives unique columns. – mickey Nov 09 '18 at 23:53
  • 1
    Duplicates are a problem when the input vector has duplicates (1 is listed twice in the input vector). A duplicate is demonstrated in the second example. `combn` works in the elements themselves, not just unique elements. – r2evans Nov 09 '18 at 23:55
  • This looks good, but I copied your code and got an error (`Error in colSums(v) : 'x' must be an array of at least two dimensions`). Trying to figure out why. – Kira Tebbe Nov 10 '18 at 04:53
  • What r2evans calls `vec`, you were calling `x`. Make sure you have the names of the variables correct. – mickey Nov 10 '18 at 14:45
  • The issue seems to be when that `combn(vec, i)` returns a list, not a matrix, when `i` is equal to the length of elements in `vec`. It has an easy workaround, but I'm not sure why that wasn't happening here. – Kira Tebbe Nov 10 '18 at 18:35
  • When exactly does `combn` return a `list`? Granted, if you force it with `combn(..., simplify=FALSE)` things are different, but `combn(1:5,5)` does not return a list for me. Are you intending to use `simplify=FALSE`? – r2evans Nov 11 '18 at 01:11
  • Thanks for the answer! BTW, how to find out the combination of numbers that add to more than a number? Taken account this example, how do we find the number of combination of the element in the vector that add up to above 5? – Tong Claire Xu Jan 27 '21 at 15:34
  • As a quick hack, if `vec` is a vector `vec` and you want to know how many combinations of length-2 will sum up above `someval`, then `sum(colSums(combn(vec, 2)) > someval)` will work. If this isn't enough for you, I suggest you open a new question. – r2evans Jan 27 '21 at 15:38
5

Now here is a solution involving gtools:

# Creating lists of all permutations of the vector x
df1 <- gtools::permutations(n=length(x),r=length(x),v=1:length(x),repeats.allowed=FALSE)
ls1 <- list()
for(j in 1:nrow(df1)) ls1[[j]] <- x[df1[j,1:ncol(df1)]]  

# Taking all cumulative sums and filtering entries equaling our magic number
sumsCum <- t(vapply(1:length(ls1), function(j) cumsum(ls1[[j]]), numeric(length(x))))
indexMN <- which(sumsCum == magicNumber, arr.ind = T)
finalList <- list()
for(j in 1:nrow(indexMN)){
    magicRow <- indexMN[j,1]
    magicCol <- 1:indexMN[j,2]
    finalList[[j]] <- ls1[[magicRow]][magicCol]
}
finalList <- unique(finalList)

where x = c(1,1,2,3,5) and magicNumber = 5. This is a first draft, I am sure it can be improved here and there.

niko
  • 5,253
  • 1
  • 12
  • 32
3

Not the most efficient but the most compact so far:

x <- c(1,1,2,3,5)
n <- length(x)
res <- 5
unique(combn(c(x,rep(0,n-1)), n, function(x) x[x!=0][sum(x)==res], FALSE))[-1]
# [[1]]
# [1] 1 1 3
# 
# [[2]]
# [1] 2 3
# 
# [[3]]
# [1] 5
# 
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167