0

Here's the problem: given a sequence of numbers, split these numbers into 2 sequences, so that the difference between the two sequences is the minimum. For example, given the sequence: [5, 4, 3, 3, 3] the solution is:
[5, 4] -> sum is 9
[3, 3, 3] -> sum is 9
The difference is 0

In other terms, can you find an algorithm (C language preferred) that given an input vector (variable size) of integers, can output two vector where the difference between the two sum is minimum?
Brutal force algorithm should be avoided.

To be sure to get the right solution, should be nice to compare in a benchmark the results between your algorithm and a brutal force algorithm.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
  • I can't find an algorithm for this.. :-( I've tried to use this simple algorithm: order the number descending (like in the example), then put the numbers in the vectors with this simple rule: use the vector where the current sum is minimum. Works in many case, but not always. In the given example, it doesn't work, because the result will be: v1: [5, 3] and v2: [4, 3, 3] – user2181556 Mar 18 '13 at 09:05
  • 1
    this is a NPC problem. – Bartlomiej Lewandowski Mar 18 '13 at 09:07
  • The given problem is simple to explain, but looks hard to solve... – user2181556 Mar 18 '13 at 09:10
  • http://en.wikipedia.org/wiki/Partition_problem this might have an answer to your question – Bartlomiej Lewandowski Mar 18 '13 at 09:10
  • Thank you! I've been scared by the fact that my problem is considered "The Easiest Hard Problem", so my work seems to be hard now.... – user2181556 Mar 18 '13 at 09:18
  • Your wording is unfortunately *maximally* unclear. Since you used the word "sequence", I'm guessing you have an ordered list of N numbers, and you want to find which of the N-1 positions between these numbers to split the list in two: Dukeling's answer solves this very simple problem. OTOH you *might* be meaning "subsets" instead, where the numbers in each subset don't have to be contiguous, in which case you can't do much better than brute force. If you intended the 2nd meaning then you could have chosen *almost any other* example, and it would have made this clear. – j_random_hacker Mar 18 '13 at 16:53

1 Answers1

2

It sounds like a sub-arrays problem (which is my interpretation of "sequences").

Meaning the only possibilities for 5, 4, 3, 3, 3 are:

| 5, 4, 3, 3, 3  =>  0 - 18 => 18
 5 | 4, 3, 3, 3  =>  5 - 13 => 8
 5, 4 | 3, 3, 3  =>  9 -  9 => 0
 5, 4, 3 | 3, 3  => 12 -  6 => 6
 5, 4, 3, 3 | 3  => 15 -  3 => 12
 5, 4, 3, 3, 3 | => 18 -  0 => 18 (same as first)

It is as simple as just comparing the sums on either side of every index.

Code: (untested)

int total = 0;
for (int i = 0; i < n; i++)
  total += arr[i];
int best = INT_MAX, bestPos = -1, current = 0;
for (int i = 0; i < n; i++)
{
  current += arr[i];
  int diff = abs(current - total);
  if (diff < best)
  {
    best = diff;
    bestPos = i;
  }
  // else break; - optimisation, may not work
}

printf("The best position is at %d\n", bestPos);

The above is O(n), logically, you can't do much better than that.

You can slightly optimize the above by doing a binary-search-like process on the sequence to get down to n + log n rather than 2n, but both are O(n). Basic pseudo-code:

sum[0] = arr[0]
// sum[i] represents sum from indices 0 to i
for (i = 1:n)
  sum[i] = sum[i-1] + arr[i]
total = sum[n]
start = 0
end = n
best = MAX
repeat:
  if (start == end) stop
  mid = (start + end) / 2
  sumFromMidToN = sum[n] - sum[mid]
  best = max(best, abs(sumFromMidToN - sum[mid]))
  if (sum[mid] > sumFromMidToN)
    end = mid
  else if (sum[mid] < sumFromMidToN)
    start = mid
  else
    stop

If it's actually subsets, then, as already mentioned, it appears to be the optimization version of the Partition problem, which is a lot more difficult.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138