2

I am trying to come up with a way to group together a list of [name, size] pairs. The idea is to group them together so that each group is as close as possible to a maximum_group_size, but without breaking a maximum_group_length cutoff.

For example, if maximum_group_size = 10, and maximum_group_length = 3, then the following simple dataset could be grouped as so:

data = [['A', 3], ['B', 6], ['C', 7], ['D', 1]]
grouped: [[['A', 3], ['B', 6]], [['C', 7], ['D', 1]]]

But this data set would end up as:

data = [['A', 1], ['B', 1], ['C', 1], ['D', 1]]
grouped: [[['A', 1], ['B', 1], ['C', 1]], ['D', 1]]]

The input data are unsorted, but could easily be sorted. My intuition was not to do so because sorting would lead to the largest number of groups that are not full.

How could I go about (prettily) doing this? My current implementation is not very elegant:

def group_references(sam_handle, num_bases=10 ** 7, max_seqs=100):
    """
    Group up references by num_bases, unless that exceeds max_seqs.
    """
    name_iter = itertools.izip(*[sam_handle.references, sam_handle.lengths])
    name, size = name_iter.next()
    this_bin = [[name, size]]
    bin_base_count = size
    num_seqs = 1
    for name, size in name_iter:
        bin_base_count += size
        num_seqs += 1
        if bin_base_count >= num_bases or num_seqs > max_seqs:
            yield this_bin
            this_bin = [[name, size]]
            bin_base_count = size
            num_seqs = 1
        else:
            this_bin.append([name, size])
Ian Fiddes
  • 2,821
  • 5
  • 29
  • 49

2 Answers2

2

You can prove that if you can solve this optimally in polynomial time, you could solve the knapsack problem in polynomial time as well, which is impossible. Therefore, the solution is non-polynomial.

Anyways, I suggest looking at various approximate solutions for the knapsack problem and pick the one that is most appropriate for your case.

ElKamina
  • 7,747
  • 28
  • 43
0

Your problem is a Bin packing problem or Packing problem.

Take a look at a similar question where you can find a piece of code in python.

Community
  • 1
  • 1
Eric Citaire
  • 4,355
  • 1
  • 29
  • 49