Python split a list into N chunks with approximately equal sums

Question

I know this was kinda asked a lot, but this one adds a little spice because, in all the questions and answers I saw, the number of items in each sublist (or chunk) isn't always the same.

What I'm looking for is:

You have a list: [199, 139, 191, 140, 67, 437, 111, 122, 158, 123, 165, 138, 117, 132]
It must be split into 2 chunks, where the sums of each chunk are approximately equal, even if the difference is 100 or 200, each chunk must have 7 items (other answers I have found split it into 8-6 or even 9-5 just to have a sum difference of 20-50)

Do you need to find the most equal split, or just one that's "good enough"? What is the greater context for the problem - why are you trying to solve it? In general will there always be two chunks, or what? — Karl Knechtel, Jan 31 '21 at 13:35
make chunks as many as you want and keep one variable for each chunk, that is sum of that chunk. now loop through the array and insert item into lowest summed chunk. — tard, Jan 31 '21 at 13:37
[199, 139, 191, 158, 165, 138, 132] and [140, 67, 437, 111, 122, 123, 117] are optimal with sum difference 5. — superb rain, Jan 31 '21 at 13:44
This is for a personal script, that reads data from google sheets, does a word count, then split them in chunks with equal words, the list you guys see is a list of lines, sometimes X words are in 67 lines, other times the same X words are in 400 lines, I just need an automated way to split them, because there are more sheets coming. — Amine Messabhia, Jan 31 '21 at 14:53
So you are not so much asking for *help* as asking others to do your work for you. — Scott Hunter, Jan 31 '21 at 14:56

Daniel Hao · Accepted Answer · 2021-01-31T18:10:27.770

This is algorithmic approach (w/o any lib) and it's faster too. Performance testing with 20-30 items list, it only took less than 1 second (earlier post will take 18-19 seconds!) It's about 50-70 times faster.

It still has room to be improved, but for a reasonable size of list (40ish), it's performant well enough (<1.5 seconds).

import time
def split_list(lst):
    '''split the list to 2 sublists with minimum difference '''
    
    iterations = range(2, len(lst)//2+1)

    totals = sum(lst)
    half = totals // 2; old_moves = {}
    
    half_lambda = lambda i: (abs((i)- half), i)

    for num in lst:
        left = lst[:]
        left.remove(num)
        old_moves[num] = left
    #print(f' old_moves: {old_moves} ')

    if iterations == []:         # float() ?
        ans = min(map(half_lambd, old_moves.keys()))
        return (ans[1], sum(old_moves[ans[1]]), old_moves[ans[1]])

    for n in iterations:
        new_moves = {}
        
        for total, roster in old_moves.items():
            for n in roster:
                left = roster[:]
                left.remove(n)
                new_total = total + n
                if new_total > half: continue
                new_moves[new_total] = left
        old_moves = new_moves

    ans = min(map(half_lambda, old_moves.keys()))
    another = list(set(lst) - set(old_moves[ans[1]]))
    #print(f' another: {another} ')
    return ans[1], sum(old_moves[ans[1]]), old_moves[ans[1]], another
                                            #old_moves[ans[0]])
lst  = [199, 139, 191, 140, 67, 437, 111, 122, 158, 123, 165, 138, 117, 132,
        990, 998, 888, 884, 332, 338]       # it only take 0.12 seconds

start = time.time()
print(split_list(lst))       # running the function
print(time.time() - start)   # in Python3.8 Aspire E15 machine (3.9 even faster)

This is definitely a much more scalable solution than mine. With a list being 14C7 there are only 3432 combinations but that number increases quickly as more elements are added. — Shane Springer, Jan 31 '21 at 19:03

Shane Springer · Answer 2 · 2021-01-31T19:05:01.757

Could do something like this

from statistics import mean 
from itertools import combinations

main = [199, 139, 191, 140, 67, 437, 111, 122, 158, 123, 165, 138, 117, 132]

combi = []

minimum = sum(main)

for i in combinations(main, 7):
    if abs(mean(i)-mean(main))  < minimum:
        minimum = abs(mean(i)-mean(main))
        combi = i

combi = list(combi)

for i in combi:
    main.remove(i)


print(combi, mean(combi), sum(combi))

print(main, mean(main), sum(main))

Gives a value "minimum" an arbitrary value of the sum of the list. itertools combinations returns all of the possible 7 item combinations of the main list, if the mean of that new 7 item list is less than the minimum it redefines the new minimum and stores those 7 values in a new list "combi".

After it finishes iterating through all of the combinations what is stored in combi will be the optimal 7 values to have the mean of both lists the closest. these values are then removed from the main list to get the other half.

Edit: See Daniel Hao's answer for a much more scalable solution

Thiis `won't` work for a long list with just 20+ numbers, because the combinations. It should take algorithmic approach instead... Try some list like this - [199, 139, 191, 140, 67, 437, 111, 122, 158, 123, 165, 138, 117, 132, 999, 998, 888, 886, 334, 336, 389, 390, 422, 424, 522, 524, 824, 776, ] And see what's happening... — Daniel Hao, Jan 31 '21 at 17:19
Could also use `min` with `key`, like `combi = min(combinations(main, 7), key=lambda i: abs(mean(i)-mean(main)))`. — superb rain, Jan 31 '21 at 18:31

Python split a list into N chunks with approximately equal sums

2 Answers2