Group the numbers C++

Question

Here's the problem:

You have N (N represents the number of numbers that you have) numbers. Divide them in 2 groups in such way that the difference between sums of the numbers in the groups is minimal.

Examples:

5 // N

1, 9, 5, 3, 8 // The numbers

The difference is 0 if we put 1, 9 and 3 in Group A and 5 and 8 in Group B.

I think first I should calculate the sum of all numbers and divide it by 2. Then to check ever possible combination of numbers, whose sum is not higher than half of the sum of all numbers. After I do this I will choose the biggest number and print out the groups.

I have problem with going through all combinations, especialy when N is big numbers. How can I run through all combinations?

Also i think a little bit differently, I will group the numbers in descending order and i'll put the biggest number in Group A and lowest in Group B. Then I do the other way around. This works with some of the numbers, but sometimes it doesn't show the optimal grouping. For example:

If I use the previous example. Arrange the number in descending order.

9, 8, 5, 3, 1.

Put the biggest in Group A and lowest in Group B.

Group A: 9
Group B: 1

Other way around.

Group A: 9, 3
Group B: 1, 8

And so on. If in the end i have only one number I'll put it in the group with lower sum. So I finally will get:

Group A: 9, 3
Group B: 1, 8, 5

This isn't the optimal grouping because the difference is 2, but with grouping in different way the difference can be 0, as I showed.

How can I get optimal grouping?

CODE:

#include <iostream>
#include <cmath>
#include <string>
using namespace std;
int convertToBinary(int number) {
    int remainder;
    int binNumber = 0;
    int i = 1;
    while(number!=0)
    {
        remainder=number%2;
        binNumber=binNumber + (i*remainder);
        number=number/2;
        i=i*10;
    }
    return binNumber;
}
int main()
{
    int number, combinations, sum = 0;
    double average;
    cin >> number;
    int numbers[number];
    for(int i = 0; i<number; i++)
    {
        cin >> numbers[i];
        sum += numbers[i];
    }
    if(sum%2 == 0)
    {
        average = sum/2;
    }
    else
    {
        average = sum/2 + 0.5;
    }
    combinations = pow(2,number-1);
    double closest = average;
    for(int i = 0; i<=combinations;i++)
    {
        int rem;
        int temp_sum = 0;
        int state = convertToBinary(i);
        for(int j = 0; state!=0; j++)
        {
            int rem =state%10;
            state = state/10;
            if(rem == 1)
            {
                temp_sum = temp_sum + numbers[j];
            }
        }
        if(abs(average-temp_sum)<closest)
        {
            closest = abs(average-temp_sum);
            if(closest == 0)
            {
                break;
            }
        }
    }
    cout << closest*2;
    return 0;
}

This is a variation of the Knapsack problem which is NP-Complete. This means you can't find an optimal solution in polynomial time (as you correctly identified, for large N, iterating over the all combinations would be costly) — T. Kiley, Mar 01 '13 at 00:14
If the numbers have a relatively limited range, there's a fairly effecient ("pseudo-polynomial") algorithm (and this is a [duplicate](http://stackoverflow.com/q/5717849/179910)). As an aside, being NP-complete doesn't mean you *can't* find an efficient algorithm -- it only means a lot of people have looked for a long time, and failed, so if you do so, it'll be a *massive* breakthrough. — Jerry Coffin, Mar 01 '13 at 00:20
I forgot to add that the problem states N is lower or equal to 300 and numbers are lower or equal to 100 000 — Stefan4024, Mar 01 '13 at 00:27

Neil Townsend · Answer 1 · 2013-03-08T10:59:16.410

2

Although this, as others have commented, is an NP-Complete problem, you have provided two fairly helpful bounds: you only want to split the group of numbers into two groups and you want to get the sums of the two groups as close as possible.

Your suggestion of working out the total sum of the numbers and dividing it by two is the right starting point - this means you know what the ideal sum of each group is. I also suspect that your best bet is to start by putting the largest number into, say, group A. (it has to go into one group, and it's the worst one to place later, so why not put it in there?)

This is when we get into heuristics which you cycle through until the groups are done:

N: Size of list of numbers.
t: sum of numbers divided by two (t is for target)

1. Is there a non-placed number which gets either group to within 0.5 of t? If so, put it in that group, put the remaining numbers in the other group and you're done.
2. If not, place the biggest remaining number in the group with the current lowest sum
3. go back to 1.

There will doubtless be cases that fail, but as a rough approach this should get close fairly often. To actually code the above, you will want to put the numbers in an ordered list so it is easy to work through them from largest to smallest. (Step 1 can then also be streamlined by checking (against both "groups so far") from largest remaining down until the "group so far" added to the number being checked are more then 1.0 below t - after that the condition cannot be met.)

Do let me know if this works!

edited Mar 08 '13 at 10:59

answered Mar 06 '13 at 18:34

Neil Townsend

6,024
5
35
52

8 @Neil Townsend It works only when difference between the numbers is low and when the difference between sums is 0. Here is example where it doesn't work. 301, 301, 300, 241, 173, 168, 118,117 Average is 859.5 – Stefan4024 Mar 08 '13 at 17:33
2. Check for 301 (301 + 301 isn't near 859.5). So we add 301 to group B – Stefan4024 Mar 08 '13 at 17:37
3. Check for 300 (301 + 300 = 601 < 859.5). So we add 300 to group B – Stefan4024 Mar 08 '13 at 17:39
4. Check for 241 (301 + 300 + 241 = 842 <859.5). So we add 241 to group A. – Stefan4024 Mar 08 '13 at 17:40
5. Check for 173 (301 + 300 + 173 = 774 <859.5). So we add 173 to Group A. – Stefan4024 Mar 08 '13 at 17:41
6. Check for 168 (301 + 241 + 173 + 168 = 883>859.5; 301 + 300 + 168 = 769<859.5). So we add 168 to group B. – Stefan4024 Mar 08 '13 at 17:41
7. Check for 118 (301 + 300 + 168 + 118 = 887>859.5; 301 + 241 + 173 + 118 = 833<859.5). So we add 118 to group A. – Stefan4024 Mar 08 '13 at 17:42
8. Check for 117 (301 + 241 + 173 + 118 + 117 = 950<859.5; 301 + 300 + 168 + 117 = 886>859.5). So we add 117 to group B. – Stefan4024 Mar 08 '13 at 17:42
Final result: Group A: 301, 241, 173, 118 = 833 Group B: 301, 300, 168, 117 = 886 Difference: 886 - 833 = 53 But the optimal difference is 35. Right partition: Group A: 301, 300, 241 = 842 Group B: 301, 173, 168, 118, 117 = 877 Difference: 877 - 842 = 35. – Stefan4024 Mar 08 '13 at 17:43
Fair point - which is why this is in the class of NP problems. Thanks for the feedback. – Neil Townsend Mar 08 '13 at 18:31

score 1 · Accepted Answer · edited May 23 '17 at 12:28

Using the constraint of only two groups zour problem is already solved if you can find one grouping of numbers whichs sum is exactlz half of the total. thus I suggest you try to find this group and put the remainder into the other group obviously.

The assumption to put the biggest number in the first group is simple. Now the rest is more tricky.

This is simple in the binary system: Consider that for each number you have a bit. The bit beeing 1 signals that the number is in group A, otherwise it is in group B. The entire distribution can be described by a concatination of these bits. This can be considered a number. SO to check all combinations you have to go through all the numbers and calculate the combination.

code:

#include <iostream>
#include <memory>
using namespace std;

int partition(const std::unique_ptr<int[]>& numbers, int elements) {
    int sum = 0;

    for(int i=0; i<elements; ++i) {
        sum += numbers[i];
    }

    double average = sum/2.0;
    double closest = average+.5;

    int beststate = 0;


    for(int state=1; state< 1<<(elements-1);++state) { 
        int tempsum = 0;
        for(int i=0; i<elements; ++i) {
            if( state&(1<<i) ) {
                tempsum += numbers[i];
            }
        }

        double delta=abs(tempsum-average);
        if(delta < 1) { //if delta is .5 it won't get better i.e. (3,5) (9) => average =8.5
            cout << state;
            return state;
        }

        if(delta<closest) {
            closest   = delta;
            beststate = state;
        }
    }

    return beststate;
}

void printPartition(int state, const std::unique_ptr<int[]>& numbers, int elements) {
    cout << "(";
    for(int i=0; i<elements; ++i) {
        if(state&(1<<i)) {
            cout << numbers[i]<< ",";
        }
    }
    cout << ")" << endl;
}

int main()
{
    int elements;

    cout << "number of elements:";
    cin >> elements;

    std::unique_ptr<int[]> numbers(new int[elements]);

    for(int i = 0; i<elements; i++)
    {
        cin >> numbers[i];
    }

    int groupA = partition(numbers, elements);

cout << "\n\nSolution:\n";
    printPartition(groupA, numbers, elements);
    printPartition(~groupA,numbers, elements);
    return 0;
}

edit: For further (and better) solutions to generating all possibilities check this awnser. Here is a link to knuths book which I found here

edit2: To explain the concept of enumeration as requested:

suppose we have three elements, 1,23,5. all possible combinations disregarding permutations can be generated by filling out a table:

1 | 23 | 5         Concatination   Decimal interpretation
-----------
0 |  0 | 0         000                0
0 |  0 | 1         001                1
0 |  1 | 0         010                2
0 |  1 | 1         011                3
1 |  0 | 0         100                4
1 |  0 | 1         101                5
1 |  1 | 0         110                6
1 |  1 | 1         111                7

If we now take for instant the number 4 this maps to 100 which says that the first number is in group A and the second and third number are not (which implies they are in group B). Thus A is 1 while B is 23,5.

Now to explain the trick why I only need to look at half: if we look at the decimal interpretation 3 (011 binary) we get for Group A 23,5 and for Group B 1. If we compare this to the example for 4 we notice that we have the same numbers grouped, just in the exactly opposite group names. Since this makes no difference for your problem we don't have to look at this.

edit3: I added real code to try out, in the pseudocode i made the wrong assumption that i would always include the first element in the sum, which was wrong. As for your code that I started out on: you can't allocate arrays like that. Another solution instead of an Array would be a vector<int> which avoids the problems of having to pass the arraysize to the functions. Using this would be a great improvement. Furthermore this code is far from good. You will run into issues with int size (usually this should work for up to 32 elements). You can work arround this though (try this maybe How to handle arbitrarily large integers). Or you actually read up on knuth (see above) I am sure you will find some recursive approach. THis code is also slow, since it always rebuilds the whole sum. One optimization would be to look into gray codes (I think Knuth describes them aswell). That way you only have to add/substract one number per permutation you test. That would be a performance boost in the order of n, since you replace n-1 additions with 1 addition/substraction.

How can I use simulated Annealing to solve this problem? I've never used this method, so can you explain a little bit. — Stefan4024, Mar 08 '13 at 11:46
@Stefan4024 It calculates the sum of the current group `A`. State, is a concatination of the state bits. So the nth-bit indicates weather or not the nth-number is in group a. To check this, a bitmask is used to check if the nth bit is set to `1`. If it is, we include it in the sum of group A. — ted, Mar 08 '13 at 12:07
@Stefan4024 This is lazy pseudocode one could optimize by just adding or subtracting the element that is added or removed from the last state. — ted, Mar 08 '13 at 12:08
I actually don't understand how you run through all combination and how you switch number between group A and group B. — Stefan4024, Mar 08 '13 at 18:18
Also I think simulated annealing isn't the right solution for this problem, because it can generate different answers. So sometimes the difference can be 2, but sometimes 0 in the example i showed in the original post. — Stefan4024, Mar 08 '13 at 18:23
@Stefan4024 yes simulated annealing is only heuristic. What part about all solution don't you understand? — ted, Mar 08 '13 at 18:27
I updated the answer, maybe it is clearer now, take a look at the bottom. — ted, Mar 08 '13 at 18:39
Thanks man, i've learned something about simulated annealing and combinatiorics in C++. I know this is a good idea, but unfortunly this isn't the optimal soluton to this problem. — Stefan4024, Mar 08 '13 at 18:43
@Stefan4024 in the bottom I describe how to do it deterministic (no simulated annealing) it will check _all_ the solutions, it only quits early if it finds an optimal solution (both groups have exactly the same size). — ted, Mar 08 '13 at 18:46
I uploaded the code in the original post. Take a look at it. It's different from yours pseudocode, but works on the same basis. But when I input bigger numbers, the result is different. Can you check the code? — Stefan4024, Mar 08 '13 at 20:28
I am tired now, and won't check the whole thing, some quick pointers though: you don't need a binary conversion. Try what the code does without calling that. I have difficulties explaining this, but conversion is only needed if you wan't to make a string out of the number. The computer stores numbers in binary anyways. Second thing: watch out with the average. instead of doing an integer devision `(integer/2)` which results in `floor(integer/2)` and offsetting this by adding .5 try a real float division, i.e. `static_cast(integer)/2` (`integer/2.0` might work but my cpp is rusty) — ted, Mar 08 '13 at 20:41
I've done that with the average and it works with integer/2.0. But I don't understand your first note. But anyway it should work. But it doesn't. With the example in DwB answer it computes 18, instead of 0. — Stefan4024, Mar 08 '13 at 20:56
I edited my awnser check out the code _and_ the footnotes at the very bottom. I removed the simulated annealing part to make the post more readable. Please tell me if this works for you. — ted, Mar 08 '13 at 22:27

score 0 · Answer 3 · answered Mar 06 '13 at 22:15

If the average value of the individual group is same as the average of the complete set, then obviously the difference between the two group will be less. By using this we can bring up a algorithm for this problem.

Get the average of the complete set.
Take the largest value in the set and put it in one of the group.
Get the difference of the average of the individual group and the average of the entire set.
Place the next largest number in the group which is having the maximum difference.
Repeat the steps 3 and 4 until all the numbers are placed.

This will be the efficient approach to get the near optimal solution.

Can you explain step 3 and 4 a little bit better? – Stefan4024 Mar 07 '13 at 22:46 — Stefan4024, Mar 07 '13 at 22:46

score 0 · Answer 4 · answered Mar 08 '13 at 18:53

0

How about this:

Sort the list of numbers.
Put the largest number in group A. Remove that number from the list.
If the sum of all numbers in group A is less than the sum of all numbers in group B, goto 2.
Put the largest number in group B. Remove that number from the list.
If the sum of all numbers in group B is less than the sum of all numbers in group A, goto 4.
If more than zero numbers remain in the list, goto 2.

answered Mar 08 '13 at 18:53

DwB

37,124
11
56
82

19 39 40 49 319 315 295 244 222 198 184 182 172 143 139 120 97 71 64 49 40 39 1. Put 319 in group A (319) 2. Put 315 in group B (315) 3. Put 295 in group B (610) 4. Put 244 in group A (563) 5. Put 222 in group A (785) 6. Put 198 in group B (808) 7. Put 184 in group A (969) 8. Put 182 in group B (990) 9. Put 172 in group A (1141) 10. Put 143 in group B (1133) 11. Put 139 in group B (1272) 12. Put 139 in group A (1280) 13. Put 120 in group B (1392) 14. Put 97 in group A (1377) 15. Put 71 in group A (1448) 16. Put 64 in group B (1456) – Stefan4024 Mar 08 '13 at 20:48
17. Put 49 in group A (1497) 18. Put 40 in group B (1496) 19. Put 39 in group B (1535) Difference is 38, although the optimal difference is 0. – Stefan4024 Mar 08 '13 at 20:48

Group the numbers C++

4 Answers4