Partition a set into two subsets so that the difference of the sum is minimum and return the two subsets

Question

There are many questions related to this, e.g. here.

However, all of the answers focus on finding the minimum absolute sum. I'm trying to use some of the approaches laid out in the various answers, but I care less about returning the sum and more about returning the two subsets (or the first found best if there are multiple solutions).

Is there a way to do this without it being NP-hard?

My use-case is that I have a set of elements of various heights, and I need to arrange those elements into two columns (two arrays) to minimize the total maximum height of the elements when they are stacked next to each other. I have a solution that gives a decent result most of the time, but it isn't always accurate. I'm hoping to not have to calculate every single combination of two sets to get to an ideal solution.

*I'm aware there is optimization to be had below on removing items from the array and calculating the sum.

// Input is always ordered by size
let inputWorking = [{
    item: 'a',
    size: 5
  },
  {
    item: 'b',
    size: 6
  },
  {
    item: 'c',
    size: 7
  },
  {
    item: 'd',
    size: 8
  },
];

let inputNotWorking = [{
    item: 'a',
    size: 14
  },
  {
    item: 'b',
    size: 14
  },
  {
    item: 'c',
    size: 20
  },
  {
    item: 'd',
    size: 20
  },
  {
    item: 'e',
    size: 21
  },
];


console.log('Output is correct', pushSides(inputWorking));

// Best fit would be [a,b,c], [d,e] with a max height of 48
console.log('Output is incorrect', pushSides(inputNotWorking));


function pushSides(items, ls = [], rs = []) {
  if (items.length === 0) {
    const leftSize = this.sumSizes(ls);
    const rightSize = this.sumSizes(rs);
    return {
      left: ls,
      right: rs,
      maxHeight: leftSize > rightSize ? leftSize : rightSize
    };
  }

  const lastItem = items[items.length - 1];
  const result = this.pushToIdealSide(lastItem, ls, rs);
  // Remove item we used
  items = items.filter(t => t !== lastItem);
  return pushSides(items, result.ls, result.rs);
}

function pushToIdealSide(nextItem, ls = [], rs = []) {
  if (this.sumSizes(rs) + nextItem.size > this.sumSizes(ls) + nextItem.size) {
    ls.push(nextItem);
  } else {
    rs.push(nextItem);
  }

  return {
    ls,
    rs
  };
}

function sumSizes(itemSizeArray) {
  return itemSizeArray.map(c => c.size)
    .reduce((prev, curr) => prev + curr, 0);
}

One approach that comes to mind is to find the value of the 2 sums that have the least absolute difference and then going through the array and trying to take numbers such that they add up to the sum of one of those 2 values. The remaining elements are your second partition. — user1984, Nov 11 '21 at 00:54
The hard part is ` trying to take numbers such that they add up to the sum of one of those 2 values` you end up with the same problem having to try a lot of combiantions. For example if your sum for one side is 11 and you have 4,4,2,11... That would be a brute force approach, perhaps could have some optimizations, but i'd be just as well off creating every possible combination of arrays... No? — Ray Suelzer, Nov 11 '21 at 01:03
As I understand our partitions don't need to by contiguous subarrays. So we can sort and try to find one of the numbers using a 2 pointer approach? I'm not sure if we can do it greedily. The fact that we know for sure that a subset of the array adds up to one of those nums may help here. — user1984, Nov 11 '21 at 01:09
You can use dynamic programming to solve the subset sum problem, provided that the sum of all the heights is a reasonably small integer. — user3386109, Nov 11 '21 at 02:41
Does this help - https://www.geeksforgeeks.org/partition-a-set-into-two-subsets-such-that-the-difference-of-subset-sums-is-minimum/ ? — AKSingh, Nov 11 '21 at 04:30

score 2 · Accepted Answer · answered Nov 11 '21 at 19:03

This is a function version of the partition problem, so it is FNP-complete (and thus at least as hard as NP-complete). You can also phrase the question as a function version of Subset-Sum. Luckily, it has a pseudopolynomial solution and often solvable quickly in practice.

It's more useful to consider an equivalent form of your problem: Given a multiset of integers S, whose sum is T, find a subset of S whose sum is at most T/2 and as close to T/2 as possible. This subset is just the subset with smaller sum from your problem, so the other subset is the rest of S.

Given an algorithm (such as the one you link in the other post) that merely finds the optimal-sum-at-most-half (or, equivalently, the minimum absolute difference), there is usually a straightforward way to modify it to get the actual subset. When generating a list of subset-sums, also store the index of an element that generated that subset-sum. Then, at the end, we can backtrack from our final found sum to recover the elements.

For a simple example, if our array is [1, 5, 8], we might generate all subset sums by adding elements one at a time:

Sums-Dict: Hashmap from subset-sums to last added element for that sum.
Initial Sums: {0: null}

Add element 1:
Sums-Dict = {0: null, 1: 1}

Add element 5:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5}

Add element 8:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5, 8: 8, 9: 8, 13: 8, 14: 8}

Then to backtrack, we use a process similar to backtracking in the knapsack-problem to output a solution:

Find a closest sum to (1+5+8)/2, for example '6'.

Backtracking:
Used-elements = [], Current Sum = 6.
Sums-Dict[6] = 5: add 5 to used-elements, subtract 5 from current sum

Used-elements = [5], Current Sum = 1.
Sums-Dict[1] = 1: add 1 to used-elements, subtract 1 from current sum


Used-elements = [5, 1], Current Sum = 0.
Sums-Dict[0] = null: stop. We have found the subset that summed to '6'.

The classic dynamic-programming solution can be trivially modified to store that extra information for each sum; if the numbers are small, this is your best bet. I've included a Python implementation for subset-sum with a core algorithm and notation based on the Schroeppel-Shamir paper. This is a meet-in-the-middle version of the naive inclusion/exclusion solution to subset-sum. It's more complex than the brute-force approach, but runs in O(2^(n/2) * n/4) time and takes O(2^(n/4)) space, so is a practical solution for larger inputs.

from typing import Dict, List, Tuple
import collections
import math
import heapq


class SubsetSumSolver:
    """Solves the subset-sum and partition optimization problems.
    Useful when values or goal sum are too large for dynamic programming"""

    def __init__(self, nums: List[int]):

        # Strip all zeros. Not necessary, but a useful optimization for speed
        self.orig_zeros = nums.count(0)
        self.nums = sorted(x for x in nums if x != 0)

        self.n = len(self.nums)

    def all_subset_sums(self, left_bound: int, right_bound: int) -> Dict[int, int]:
        """Return a subset-sum dictionary, mapping subset-sums of
        nums[left_bound:right_bound] to any index of an element in that subset-sum."""

        all_sums = {0: self.n + 1}
        for i in range(left_bound, right_bound):
            # Want old sums to remain/take priority
            new_sums = {self.nums[i] + elem: i for elem in all_sums}
            new_sums.update(all_sums)
            all_sums = new_sums

        return all_sums

    def recover_sum_members(self, sum_dict: Dict[int, int], found_sum: int) -> List[int]:
        """Given a subset-sum dictionary and a sum, return a set of elements
        from nums that formed that sum."""

        answer = []
        curr_sum = found_sum
        while curr_sum != 0:
            next_elem_index = sum_dict[curr_sum]
            next_elem = self.nums[next_elem_index]
            answer.append(next_elem)
            curr_sum -= next_elem

            assert len(answer) <= self.n

        return answer

    def min_absolute_difference(self, goal: float) -> List[int]:
        """Implement Schroeppel and Shamir alg. for subset sum
        Runs in O(2^(n/2) * n/4) time, takes O(2^(n/4)) space
        Returns a subset of self.nums whose sum is as close to goal as possible.
        """

        if self.n < 8:
            # Direct solution when n < 8; not worth splitting into 4 groups.
            all_sums_dict = self.all_subset_sums(0, self.n)
            best_diff_seen, best_sum = min(((abs(x - goal), x) for x in all_sums_dict),
                                           key=lambda pair: pair[0])
            return self.recover_sum_members(all_sums_dict, best_sum)

        # Split nums into 4 equal-length parts (or as close as possible)
        half = self.n // 2
        bounds = [0, half // 2, half, half + (self.n - half) // 2, self.n]
        # first_arr = nums[bounds[0]:bounds[1]]
        # sec_arr = nums[bounds[1]:bounds[2]]
        # third_arr = nums[bounds[2]:bounds[3]]
        # fourth_arr = nums[bounds[3]:bounds[4]]

        first_table_dict = self.all_subset_sums(bounds[0], bounds[1])
        first_table = list(first_table_dict.keys())

        sec_table_dict = self.all_subset_sums(bounds[1], bounds[2])
        sec_table = sorted(sec_table_dict.keys())

        third_table_dict = self.all_subset_sums(bounds[2], bounds[3])
        third_table = list(third_table_dict.keys())

        fourth_table_dict = self.all_subset_sums(bounds[3], bounds[4])
        fourth_table = sorted(fourth_table_dict.keys(), reverse=True)

        # min_heap stores pairs of problems from T1 and T2, and
        # makes the pair with smallest sum in front of the heap
        # Format: (sum, T1-index, T2-index) triplets
        min_heap = [(x + sec_table[0], i, 0) for i, x in enumerate(first_table)]

        # max_heap stores pairs of problems from T3 and T4, and
        # makes the pair with largest sum in front of the heap
        # Format: (-sum, T3-index, T4-index) triplets
        max_heap = [(-(x + fourth_table[0]), i, 0) for i, x in enumerate(third_table)]

        heapq.heapify(min_heap)
        heapq.heapify(max_heap)

        best_diff_seen = math.inf
        best_diff_indices = []

        while len(min_heap) != 0 and len(max_heap) != 0:
            smallest, p1, p2 = min_heap[0]
            largest, p3, p4 = max_heap[0]
            largest = -largest

            ans_here = smallest + largest
            if abs(goal - ans_here) < best_diff_seen:
                best_diff_seen = abs(goal - ans_here)
                best_diff_indices = (p1, p2, p3, p4)

            if ans_here <= goal:  # Want sum to increase, so increase p2
                heapq.heappop(min_heap)
                if p2 + 1 != len(sec_table):
                    heapq.heappush(min_heap,
                                   (first_table[p1] + sec_table[p2 + 1], p1, p2 + 1))
            else:  # Want sum to decrease. so increase p4
                heapq.heappop(max_heap)
                if p4 + 1 != len(fourth_table):
                    heapq.heappush(max_heap,
                                   (-(third_table[p3] + fourth_table[p4 + 1]), p3, p4 + 1))

        sum_ans = []
        p1, p2, p3, p4 = best_diff_indices

        sum_ans.extend(self.recover_sum_members(first_table_dict, first_table[p1]))
        sum_ans.extend(self.recover_sum_members(sec_table_dict, sec_table[p2]))
        sum_ans.extend(self.recover_sum_members(third_table_dict, third_table[p3]))
        sum_ans.extend(self.recover_sum_members(fourth_table_dict, fourth_table[p4]))

        return sum_ans

    def solve_partition(self) -> Tuple[List[int], List[int]]:
        """Return a partition of nums into (smaller_sum_set, larger_sum_set)
        Finds a partition whose sum-difference is minimum.
        """
        total_sum = sum(self.nums)
        frequency_counts = collections.Counter(self.nums)
        first_subset = self.min_absolute_difference(goal=total_sum / 2.0)
        if self.orig_zeros != 0:
            first_subset.extend([0] * self.orig_zeros)

        remaining_subset = frequency_counts - collections.Counter(first_subset)
        remaining_subset = list(remaining_subset.elements())
        if sum(first_subset) <= sum(remaining_subset):
            return (first_subset, remaining_subset)
        else:
            return (remaining_subset, first_subset)

You can call it like so, on any array of integers (practical up to n=100 elements):

Solver = SubsetSumSolver([1, 5, 5, 6, 7, 10, 20])
print(Solver.solve_partition())

>>> ([10, 6, 5, 5, 1], [7, 20])

Partition a set into two subsets so that the difference of the sum is minimum and return the two subsets

1 Answers1