3

I have a large array of integers (~3k items), and I want to find the indices of every combination of numbers where the sum of said numbers is equal to X. How to do this without the program taking years to execute?

I can find the first combination possible with the following Python code:

def find_numbers_that_sum_to(source_arr, target_number, return_index=True):
    result = []
    result_indices = []
    
    for idx, item in enumerate(source_arr):
        sum_list = sum(result)
        
        assert (sum_list + item) <= target_number
        result.append(item)
        result_indices.append(idx)
        
        if (sum_list + item) == target_number:
            break
    
    return result_indices

But I need every combination possible. At least a way of generating them dynamically. I only ever need one of them at a time, but if the indices it gave me don't match another criteria I need, I'll need the next set.

  • 2
    One quick suggestion is that you could `yield` instead of returning a list. That would allow the function to yield a single result, which you could evaluate as per your criteria and ask for another only if required. – Pranav Hosangadi Jan 12 '23 at 17:56
  • 1
    Well *every combination possible* of 3000 integers is going to take a while to list. Unless there are some constraints on that it is, as you already know, going to take years to execute. – High Performance Mark Jan 12 '23 at 17:57
  • 1
    Your comment on @ggeop 's answer, *"Unfortunately I need the original indices for the items, which means I can't sort them :( "* is more pessimistic than needed. There are ways to sort an array and produce the permutation, so that the indices in the original array can easily be deduced from the indices in the sorted array. See for instance [numpy.argsort](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) – Stef Jan 12 '23 at 21:31
  • How large is X value? – MBo Jan 13 '23 at 03:48
  • Related/maybe dupe [How to find all combinations that sum up to at most a constant?](https://stackoverflow.com/q/46942681/674039). The problem is called "subset sum". – wim Jan 14 '23 at 04:00

3 Answers3

4

Unfortunately, this problem is NP-hard, meaning that there's no currently known polynomial time algorithm for this. If you want to benchmark your current code against other implementations, this SO question contains a bunch. Those implementations may be faster, but their runtimes will still be exponential.

BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
  • Thanks for the answer! To elaborate, I have a dataframe with two columns (on top of their IDs), being amount and price. I need to find the specific combination of rows where the sum of amounts and weighted average of prices are equal to a specific X and Y. Do you have any idea if that somehow helps simplify the simplicity of this problem? – Gabriel Tkacz Jan 12 '23 at 19:45
  • I found [this](https://stackoverflow.com/q/55857874/17769815), but I don't know whether it's NP-hard. – BrokenBenchmark Jan 12 '23 at 20:37
0

Okay, so I know this won't work on an array of 3000 values, but I wrote this and it works pretty well. I don't know python, sorry, so here it is in c.

#include <stdio.h> 
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <math.h>

typedef struct {
  int64_t num;
  _Bool free;
} element_t;

element_t input[] = {{5, true}, {2, true}, {5, true}, {-9, true}, {11, true}, {14, true}, {16, true}, {-10, true}, {6, true}, {24, true}, {53, true}, {-46, true}, {34, true}, {19, true}, {7, true}, {55, true}, {69, true}};
int64_t input_size = sizeof(input) / sizeof(element_t);
void printsum() {  
  int64_t i;
  for (i=0; i<input_size; i++) {
    if (!input[i].free) printf("%li(%li) ", input[i].num, i);
  }
  printf("\n");
  return;
}
int64_t getsum() {  
  int64_t i, sum;
  sum = 0;
  for (i=0; i<input_size; i++) {
    if (!input[i].free) sum += input[i].num;
  }
  return sum;
}
void findresults(int64_t target, int64_t current_sum) {
  //if (target < current_sum) return; //NOTE: Comment Out This line If Array Contains -ve Values.
  int64_t i, sum;
  sum = 0;
  for (i=0; i<input_size; i++) {
    if (input[i].free != false) {
      input[i].free = false;
      if (current_sum + input[i].num == target) printsum();
      findresults(target, current_sum + input[i].num);
      input[i].free = true;
    } else {
      return;
    }
  }
}

int main(int argc, char*argv[]) {
  if (argc <= 1) exit(0);
  int64_t target_sum = atol(argv[1]);
  printf("Searching for subsets sums to %li.\n", target_sum);
  findresults(target_sum, 0);
}
Simon Goater
  • 759
  • 1
  • 1
  • 7
0

Here is another answer, which might be of some use for your particular set of data. Going other way around, starting with X.

There is known problem called Integer partition, meaning find all positive integers summing to X.

So, first partition X into sets of integer, and you could drop all sets which are longer than N=3k. Second, match sets back to your N=3k numbers and find indices.

Good place to start with integer partition is Jorg Andt FXT library and book, at https://www.jjj.de/

Looking at https://oeis.org/A000041/graph, it will work only for X being rather small, grows is really exponential (well, being NP-hard is what it is)

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64