0

I am trying to obtain all the combinations of values in a list, for which the sum of the corresponding values in another list do not exceed a certain limit. As an example, I have certain jobs that I want to produce in a batch, but that batch has a limited capacity, meaning that only certain combinations of jobs can be put together so that their size don't exceed the capacity of the batch.

Imagining I have jobs defined as job 1, job 3, and job 5, I have the following list:

jobs = [1,3,5]

If job 1 has size 1, job 3 has size 3, and job 5 has size 2, I have the following list:

jobSize = [1,3,2]

If my batch has maximum capacity of 5, this means the desired output is the following:

combinations = [(1),(3),(5),(1,3),(1,5),(3,5)] 
#here I use the name/number of the jobs, and (1,3,5) not included cause total size of 5 is exceeded

From what I have seen, something similar (not exactly the same) is possible to do using combinations from itertools, like explained here Powersets in Python using itertools or here How to get all possible combinations of a list’s elements?, but this is definitely not as efficient as I would like.

I also tried something with itertools.product and np.where, like explained here Python Numpy: All combinations from array or scalar by Corralien, but once again, this is not exactly what I need, and requires a lot of post-processing operations, ending up being too slow.

My idea is that this is possible to do somehow with numpy arrays, because it is the only way I can see this being fast. Does anyone have any idea?

  • numpy arrays are not the solution ...the problem is all of the solutions listed require full exploration of your problem space and that is what takes too long... try and think of ways you can shrink your problem space and eliminate many combinations right away ... for example if you ha the following `jobSize=[1,1,3,25]` an your max was 7 ... you probably dont need to generate any combinations that include the 25 so you have reduced your search space significantly (drops several potential solutions without having to generate any permutations) – Joran Beasley Sep 29 '22 at 03:52
  • What is the size of jobs and jobSize? – mozway Sep 29 '22 at 04:01
  • @mozway the size of jobs and jobSize is the same thing. So, if I decide to produce jobs 1 and 3, since they have sizes 1 and 3 respectively, the total size will be 4, respecting the total capacity of the batch, which is 5. But if I decide to produce jobs 1, 3 and 5, since they have sizes 1, 3 and 2 respectively, that will not be possible as the total size will be 6, exceeding the total capacity of the batch. – Paulo Nascimento Sep 29 '22 at 04:08
  • Let me rephrase, how many jobs do you have in real life? – mozway Sep 29 '22 at 04:10

2 Answers2

1

You can approach this using a 2D matrix of 0/1 where 1 indicates that the job is selected to run:

from sklearn.utils.extmath import cartesian

jobs = [1,3,5,7,9]
jobSize = [1,3,2,2,5]
target = 5

# produce all combinations of 0/1
# 1 column per job
m = cartesian([[0,1]]*len(jobSize))

# compute sum of job costs
total = (jobSize*m).sum(axis=1)

# get combinations with total cost ≤ target
idx = np.nonzero(total<=target)[0]
out = m[idx]

Output:

# job   1  3  5  7  9
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1],
       [0, 0, 0, 1, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 1, 1, 0],
       [0, 1, 0, 0, 0],
       [0, 1, 0, 1, 0],
       [0, 1, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [1, 0, 0, 1, 0],
       [1, 0, 1, 0, 0],
       [1, 0, 1, 1, 0],
       [1, 1, 0, 0, 0]])

If you want the output as lists, you can further use itertools.compress:

from itertools import compress
out = [list(compress(jobs, l)) for l in m[idx]]

output:

[[], [9], [7], [5], [5, 7], [3], [3, 7], [3, 5], [1], [1, 7], [1, 5], [1, 5, 7], [1, 3]]
mozway
  • 194,879
  • 13
  • 39
  • 75
1

I think the easiest approach is to use a recursive function. You take one job away from your list, and run the function on the remaining sublist. Then, you combine the results from the sublist with the first job accordingly.

In this way, if a combination is invalid – for example (1,2,3) when max is 5 – it will not show up in the results from the sublist, and so all longer combinations containing (1,2,3) are eliminated.

This could be improved by:

  • changing from recursive to iterative
  • using multiprocessing. For example instead of taking 1 job – taking 3 first jobs, this gives 6 combinations which would need to be combined with the results from the sublist processing. This could be seperated into processes running at the same time
  • take advantage of the fact that jobs may have the same jobSize. In which case the results are symmetric

Code:

import sys
import time
from collections import namedtuple

def print_job(self):
    #return f'( /{self.job_name}/ {self.size} )'
    return str(self.job_name)+' ('+str(self.size)+')'
def print_solution(self):
    return ' '.join([str(job) for job in self.jobs]) # +'\n'
Job = namedtuple('Job', ['job_name','size'])
Solution = namedtuple('Solution', ['jobs', 'Size'])
Job.__repr__ = print_job
Solution.__repr__ = print_solution

########### main function #################################
def find_sets(job_list, max_size):
    first_job = job_list[0]
    new_solutions = []
    if len(job_list) > 1:
        other_solutions = find_sets(job_list[1:], max_size)
        for solution in other_solutions:
            if solution.Size + first_job.size <= max_size:
                new_solution = Solution(solution.jobs + [first_job], solution.Size + first_job.size)
                new_solutions.append(new_solution)
    else:
        other_solutions = []
    if first_job.size <= max_size:
            new_solutions.append(Solution([first_job], first_job.size))
    return other_solutions + new_solutions


def find_all_job_sets(jobs='ABC', jobSize = (1,3,2), max_size=5, show_progress=True):
    sys.setrecursionlimit(1500)
    if (len(jobs) != len(jobSize)):
        print('Number of jobs and jobSizes must match.')
        exit(1)
    jobList = list(Job(name, size) for name, size in zip(jobs, jobSize))
    start = time.time()
    results = find_sets(jobList, max_size)
    end = time.time()
    print(results)
    print(f'Number of sets: {len(results)}')
    print(f'Time: {int((end - start) * 100) / 100} seconds')

 

find_all_job_sets(jobs='ABCDE', jobSize=[1,2,3,4,5], max_size=5)

Output:

[E (5), D (4), C (3), C (3) B (2), B (2), D (4) A (1), C (3) A (1), B (2) 
A (1), A (1)]
Number of sets: 9

Example:

find_all_job_sets(jobs=[f'J{i}' for i in range(100)],
                  jobSize=list(range(20,120)), max_size=200)

Output:

...
    J2 (22) J1 (21) J0 (20), J8 (28) J6 (26) J5 (25) J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J7 (27) J6 (26) J5 (25) J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J6 (26) J5 (25) J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J5 (25) J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J3 (23) J2 (22) J1 (21) J0 (20), J2 (22) J1 (21) J0 (20), J1 (21) J0 (20), J0 (20)]
Number of sets: 1462601
Time: 3.11 seconds

List of 200 jobs:

find_all_job_sets(jobs=[f'J{i}' for i in range(200)], 
jobSize= [12, 13, 14, 15, 16] + list(range(105, 300)), max_size=500)

Output:

    ...
    J5 (105) J4 (16) J3 (15) J2 (14) J1 (13) J0 (12), J6 (106) J5 (105) J4 (16) J3 (15) J2 (14) J1 (13) J0 (12), J5 (105) J4 (16) J3 (15) J2 (14) J1 (13) J0 (12), J4 (16) J3 (15) J2 (14) J1 (13) J0 (12), J3 (15) J2 (14) J1 (13) J0 (12), J2 (14) J1 (13) J0 (12), J1 (13) J0 (12), J0 (12)]
Number of sets: 3985093
Time: 7.42 seconds
AndrzejO
  • 1,502
  • 1
  • 9
  • 12
  • This solution is really good and efficient, thank you very much. In the meantime, I just found out that I only need to consider the combinations that involve certain jobs. For example, in my initial example, let us say I want to make sure either job 3 or job 5, or both, are in the combinations considered. Instead of the previous result, I would have combinations = [(3),(5),(1,3),(1,5),(3,5)]. Any idea on how to incorporate this on your solution? – Paulo Nascimento Sep 29 '22 at 18:57
  • I think you could replace line `if first_job.size <= max_size` with `if first_job.size <= max_size and first_job.job_name in ['B', 'C']:` It should work, but you need to put the jobs B and C at the end of the `jobs` list. If they are not at the end, it will not work – AndrzejO Sep 29 '22 at 19:41