This is a function version of the partition problem, so it is FNP-complete (and thus at least as hard as NP-complete). You can also phrase the question as a function version of Subset-Sum. Luckily, it has a pseudopolynomial solution and often solvable quickly in practice.
It's more useful to consider an equivalent form of your problem: Given a multiset of integers S
, whose sum is T
, find a subset of S
whose sum is at most T/2
and as close to T/2
as possible. This subset is just the subset with smaller sum from your problem, so the other subset is the rest of S
.
Given an algorithm (such as the one you link in the other post) that merely finds the optimal-sum-at-most-half (or, equivalently, the minimum absolute difference), there is usually a straightforward way to modify it to get the actual subset. When generating a list of subset-sums, also store the index of an element that generated that subset-sum. Then, at the end, we can backtrack from our final found sum to recover the elements.
For a simple example, if our array is [1, 5, 8], we might generate all subset sums by adding elements one at a time:
Sums-Dict: Hashmap from subset-sums to last added element for that sum.
Initial Sums: {0: null}
Add element 1:
Sums-Dict = {0: null, 1: 1}
Add element 5:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5}
Add element 8:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5, 8: 8, 9: 8, 13: 8, 14: 8}
Then to backtrack, we use a process similar to backtracking in the knapsack-problem to output a solution:
Find a closest sum to (1+5+8)/2, for example '6'.
Backtracking:
Used-elements = [], Current Sum = 6.
Sums-Dict[6] = 5: add 5 to used-elements, subtract 5 from current sum
Used-elements = [5], Current Sum = 1.
Sums-Dict[1] = 1: add 1 to used-elements, subtract 1 from current sum
Used-elements = [5, 1], Current Sum = 0.
Sums-Dict[0] = null: stop. We have found the subset that summed to '6'.
The classic dynamic-programming solution can be trivially modified to store that extra information for each sum; if the numbers are small, this is your best bet. I've included a Python implementation for subset-sum with a core algorithm and notation based on the Schroeppel-Shamir paper. This is a meet-in-the-middle version of the naive inclusion/exclusion solution to subset-sum. It's more complex than the brute-force approach, but runs in O(2^(n/2) * n/4)
time and takes O(2^(n/4))
space, so is a practical solution for larger inputs.
from typing import Dict, List, Tuple
import collections
import math
import heapq
class SubsetSumSolver:
"""Solves the subset-sum and partition optimization problems.
Useful when values or goal sum are too large for dynamic programming"""
def __init__(self, nums: List[int]):
# Strip all zeros. Not necessary, but a useful optimization for speed
self.orig_zeros = nums.count(0)
self.nums = sorted(x for x in nums if x != 0)
self.n = len(self.nums)
def all_subset_sums(self, left_bound: int, right_bound: int) -> Dict[int, int]:
"""Return a subset-sum dictionary, mapping subset-sums of
nums[left_bound:right_bound] to any index of an element in that subset-sum."""
all_sums = {0: self.n + 1}
for i in range(left_bound, right_bound):
# Want old sums to remain/take priority
new_sums = {self.nums[i] + elem: i for elem in all_sums}
new_sums.update(all_sums)
all_sums = new_sums
return all_sums
def recover_sum_members(self, sum_dict: Dict[int, int], found_sum: int) -> List[int]:
"""Given a subset-sum dictionary and a sum, return a set of elements
from nums that formed that sum."""
answer = []
curr_sum = found_sum
while curr_sum != 0:
next_elem_index = sum_dict[curr_sum]
next_elem = self.nums[next_elem_index]
answer.append(next_elem)
curr_sum -= next_elem
assert len(answer) <= self.n
return answer
def min_absolute_difference(self, goal: float) -> List[int]:
"""Implement Schroeppel and Shamir alg. for subset sum
Runs in O(2^(n/2) * n/4) time, takes O(2^(n/4)) space
Returns a subset of self.nums whose sum is as close to goal as possible.
"""
if self.n < 8:
# Direct solution when n < 8; not worth splitting into 4 groups.
all_sums_dict = self.all_subset_sums(0, self.n)
best_diff_seen, best_sum = min(((abs(x - goal), x) for x in all_sums_dict),
key=lambda pair: pair[0])
return self.recover_sum_members(all_sums_dict, best_sum)
# Split nums into 4 equal-length parts (or as close as possible)
half = self.n // 2
bounds = [0, half // 2, half, half + (self.n - half) // 2, self.n]
# first_arr = nums[bounds[0]:bounds[1]]
# sec_arr = nums[bounds[1]:bounds[2]]
# third_arr = nums[bounds[2]:bounds[3]]
# fourth_arr = nums[bounds[3]:bounds[4]]
first_table_dict = self.all_subset_sums(bounds[0], bounds[1])
first_table = list(first_table_dict.keys())
sec_table_dict = self.all_subset_sums(bounds[1], bounds[2])
sec_table = sorted(sec_table_dict.keys())
third_table_dict = self.all_subset_sums(bounds[2], bounds[3])
third_table = list(third_table_dict.keys())
fourth_table_dict = self.all_subset_sums(bounds[3], bounds[4])
fourth_table = sorted(fourth_table_dict.keys(), reverse=True)
# min_heap stores pairs of problems from T1 and T2, and
# makes the pair with smallest sum in front of the heap
# Format: (sum, T1-index, T2-index) triplets
min_heap = [(x + sec_table[0], i, 0) for i, x in enumerate(first_table)]
# max_heap stores pairs of problems from T3 and T4, and
# makes the pair with largest sum in front of the heap
# Format: (-sum, T3-index, T4-index) triplets
max_heap = [(-(x + fourth_table[0]), i, 0) for i, x in enumerate(third_table)]
heapq.heapify(min_heap)
heapq.heapify(max_heap)
best_diff_seen = math.inf
best_diff_indices = []
while len(min_heap) != 0 and len(max_heap) != 0:
smallest, p1, p2 = min_heap[0]
largest, p3, p4 = max_heap[0]
largest = -largest
ans_here = smallest + largest
if abs(goal - ans_here) < best_diff_seen:
best_diff_seen = abs(goal - ans_here)
best_diff_indices = (p1, p2, p3, p4)
if ans_here <= goal: # Want sum to increase, so increase p2
heapq.heappop(min_heap)
if p2 + 1 != len(sec_table):
heapq.heappush(min_heap,
(first_table[p1] + sec_table[p2 + 1], p1, p2 + 1))
else: # Want sum to decrease. so increase p4
heapq.heappop(max_heap)
if p4 + 1 != len(fourth_table):
heapq.heappush(max_heap,
(-(third_table[p3] + fourth_table[p4 + 1]), p3, p4 + 1))
sum_ans = []
p1, p2, p3, p4 = best_diff_indices
sum_ans.extend(self.recover_sum_members(first_table_dict, first_table[p1]))
sum_ans.extend(self.recover_sum_members(sec_table_dict, sec_table[p2]))
sum_ans.extend(self.recover_sum_members(third_table_dict, third_table[p3]))
sum_ans.extend(self.recover_sum_members(fourth_table_dict, fourth_table[p4]))
return sum_ans
def solve_partition(self) -> Tuple[List[int], List[int]]:
"""Return a partition of nums into (smaller_sum_set, larger_sum_set)
Finds a partition whose sum-difference is minimum.
"""
total_sum = sum(self.nums)
frequency_counts = collections.Counter(self.nums)
first_subset = self.min_absolute_difference(goal=total_sum / 2.0)
if self.orig_zeros != 0:
first_subset.extend([0] * self.orig_zeros)
remaining_subset = frequency_counts - collections.Counter(first_subset)
remaining_subset = list(remaining_subset.elements())
if sum(first_subset) <= sum(remaining_subset):
return (first_subset, remaining_subset)
else:
return (remaining_subset, first_subset)
You can call it like so, on any array of integers (practical up to n=100 elements):
Solver = SubsetSumSolver([1, 5, 5, 6, 7, 10, 20])
print(Solver.solve_partition())
>>> ([10, 6, 5, 5, 1], [7, 20])