2

Sorry in advance if the answer (1) is trivial; or (2) out there but I haven't been able to solve this issue or find and answer online. Any pointers will be much appreciated!

I am in need of a piece of code that can run through a vector and return all possible subsets of elements whose cumulative sum passes a threshold value.

Note that I do not want only the subsets that give me exactly the threshold. The cumulative sum can be above the threshold, as long as the algorithm stops adding an extra element if the value has been achieved already.

# A tiny example of the kind of input data. 
# However, note that efficiency is an issue 
# (I need to replicate the example in a large dataset)
v <- seq(1, 3) # My vector
threshold <- 3 # My threshold value

# I would like to get a list with the combinations
# 1 2 
# 1 3
# 2 3
# 3 

This piece of code works but is the clunkiest solution on earth...

for (i in 1: length(v)){
  thisvalue <- v[i]
  if (thisvalue >=threshold) { 
    cat (v[i], "\n",sep="\t") 
  } else {
    for (j in (i+1): length(v)){
      thisvalue <- v[i]+v[j]
      if (thisvalue >=threshold) { 
        cat (c(v[i], v[j]), "\n",sep="\t")
      } else {
        for (k in (i+2): length(v)){
          thisvalue <- v[i]+v[j]+v[k]
          if (thisvalue >=threshold) { 
            cat(c(v[i],v[j],v[k]),"\n",sep="\t")
        }
        }
      }
    }
  }
}
icco
  • 21
  • 2
  • 6
    I don't think your requirement is well-posed. How is (2, 5) valid if (5) is also valid and is a subset of (2, 5)? Perhaps you are misusing the word "combinations", which implies unordered subsets of the starting set, when what you really want is ordered sub-sequences of the original ordered vector. – Ryan C. Thompson Jan 05 '16 at 23:36
  • Hi Ryan, my use of the word 'combination' was to make clear the order didn't matter (this is, I was not interested in getting both, [1, 2] and [2,1]. Even if it's a relatively simple problem I had issues putting it in words. Given that your wording is more appropriate I reworded the text accordingly, thanks! – icco Jan 06 '16 at 17:09
  • That's better, but your example is still inconsistent. You say order doesn't matter, but your example implies that order is indeed significant. If (5) is a subset with a sum above the threshold of 3, then how is it valid for the algorithm to also return (2,5), which is equivalent to (5,2)? You're describing an algorithm that is fundamentally order-dependent (i.e. "keep adding elements until threshold is reached"), so the results of that algorithm are going to be order dependent. – Ryan C. Thompson Jan 06 '16 at 20:26
  • Hi again. Not sure exactly what's the misunderstanding here. I said the order doesn't matter because I'm equally happy with a (2,5) and a (5,2), as long as I only get one answer. Sorry for the inexact vocabulary.... maybe the best way to explain it is to write down the wrong way of solving this... I'll edit my post. – icco Jan 06 '16 at 21:27
  • I'm not sure how you would be happy with *either* of (2,5) or (5,2), if 5 by itself is already above your threshold without adding 2 into it. You edited your example, so using your new example, I'm not sure how any of (1,3), (2,3), (3,2), or (3,1) would be acceptable solutions, since they are all supersets of another solution, (3). If you want to treat order as being unimportant, and only want minimal subsets achieving at least the threshold sum, I would think the only two correct solutions are (1,2) and (3). Every other set achieving the threshold is a superset of one of those two. – Ryan C. Thompson Jan 06 '16 at 21:45

2 Answers2

1

This may be an option:

library(utils)
v <- seq (1,5)
v.len <- length(v)
threshold <- 3
for (count in seq(1,v.len))
{
  print(paste("subset length",count))
  combinations <- combn(v,count)
  out <- combinations[,apply(combinations, 2, sum)>=threshold]
  print (out)
}

above produces following output:

[1] "subset length 1"
[1] 3 4 5
[1] "subset length 2"
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    2    2    2    3    3     4
[2,]    2    3    4    5    3    4    5    4    5     5
[1] "subset length 3"
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    1    1    2    2    2     3
[2,]    2    2    2    3    3    4    3    3    4     4
[3,]    3    4    5    4    5    5    4    5    5     5
[1] "subset length 4"
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    1    1    2
[2,]    2    2    2    3    3
[3,]    3    3    4    4    4
[4,]    4    5    5    5    5
[1] "subset length 5"
[1] 1 2 3 4 5

so you'd need to do something with the output / decide where to store it etc.

rbm
  • 3,243
  • 2
  • 17
  • 28
  • Hi rbm, thanks for the input... the problem is that your code gives me ALL subsets with cumulative sum above the threshold, but the algorithm doesn't meet my second constraint, that it should stop adding an extra element if the value has been achieved already. Therefore, there should be no subsets of length more than two (i.e. any combination of two numbers adds >=3). I'm still trying to figure it out... I'll let you know if I get an answer! – icco Jan 06 '16 at 19:53
0

I found a solution within my limited coding skills that might be inefficient but it is feasible and neater than writing endless loops.

The function was inspired by the java code found at Find all subsets of a set that sum up to n

recursive.subset <-function(x, index, current, threshold, result){
  for (i in index:length(x)){
    if (current + x[i] >= threshold){
      cat (result, x[i], "\n",sep="\t") 
    } else {
      recursive.subset(x, i + 1, current+x[i], threshold, c(result,x[i]))
    }
  }
}

To call the function, just

inivector <- vector(mode="numeric", length=0) #initializing empty vector
recursive.subset (v, 1, 0, threshold, inivector)

So I get

1 2
1 3
2 3
3

Community
  • 1
  • 1
icco
  • 21
  • 2