0

I have an 1D array of weights, w and an array of capacities c of the same shape as w. I need to find the smallest array of indices such that when w is split by these indices, the cumsums of split arrays less than the corresponding capacities in c. Given an array of weights and capacities as follows:

w = [1,2,3,4,5,6]; c = [3, 12, 7, 6, 12]

I need to find the smallest number of indices 'i' so that the cumsums of split arrays less than the corresponding capacities in c. In this case,

i = [2, 3, 5]

The cumsums of split arrays of w formed by i are

[1, 1+2, 1+2+3, 4, 5, 5+6]

each element is clearly less than c. The cum sums are calculated as given here.

An approximation of the required indices is also fine. But the cumsums should be strictly less than corresponding elements in c.

w is a very large array (size 100000 elements). I need a vectorized solution for it to be efficient. As said before, approximations are fine as long as the cumsums are less than c

Here is what I've tried. I've assumed that the entire c matrix is just one element repeated multiple times (I'm trying to solve a simpler case first and then add complexities). In this case, I just have to ensure that each split array has to have sum less than a given value (somewhat similar to bin packing). I find the indices as follows.

weights = np.random.random_integers(1, 20, size=(20))
capacity = 100

# Find cumulative sums and divide by capacity. This gives an approximation of indices. All elements in first
# split array would have values between 0 and 1. Those in second array would have elements between 1 and 2,
# and so on. When ever the integer part changes, a new split array would be formed. Find indices from this.
# After taking the ceiling value of all elements, elements between 0 and 1 would become 1, elements between
# 1 and 2 become 2 and so on. The place where the elements change give the indices. Take diff to find the
# boundary (of change).
indices = np.diff(np.ceil(np.cumsum(weights[i]) / self.sleigh_capacity))
# 0s represent repeated elements, 1s represent values where values change. Find the indices
indices = np.where(indices != 0)[0] + 1

This gives me the indices. One thing to note is that this might give me wrong indices, because cumulative sums are calculated from the beginning. That is, cumsum of [1,2,3,2,3] is [1,2,6,8,9]. Now if my capacity is 5. dividing cumsum by 5 and taking ceil gives me [1, 1, 2, 2, 2] which would correspond to splitting indices of [1, 4]. But the actual splitting indices are [1, 3, 4]. I'm handling this by reducing the capacity. That is, if my actual capacity is 5, I'd take it as 4 and then do the above (The value 4 is gotten by pure guess. To be on the safer side I might decrease the capacity even further).

But I'm not able to extend this method to the case where the capacities are varying. That is, if I have a capacity array of shape (1,5) then I would have to use a different approach, as this approach wouldn't work.

Community
  • 1
  • 1
Aditya369
  • 834
  • 8
  • 20
  • Thanks for telling us what you need. Like in life, not much will come out of this if you so expect people to help you for free despite having done nothing yourself. – Untitled123 Dec 31 '15 at 23:32
  • Haha. Actually, I've been spending the last 10 days on this problem alone. I initially did it using a for loop, then I approximated it by finding the cumulative sums, dividing them by c, rounding them up, finding the diffs, and then getting the indices of the 1s. I've tried quite a few things, and I'm going to keep trying. I'm just looking for people to give me ideas. My entire code is on github. I'll put a link to it if you're interested. – Aditya369 Dec 31 '15 at 23:35
  • 2
    Show us what you have already done, post relevant code to your problem so that others can try things out. Considering that you've spent 10 days on it, it's not very reasonable to expect others to commit so deeply and get back to you with a complete solution. – Untitled123 Dec 31 '15 at 23:37
  • 1
    http://stackoverflow.com/help/mcve – Untitled123 Dec 31 '15 at 23:39
  • This feels like a dp, but I would imagine a greedy solution could do decently well? Namely take as many as you can within a split and then break it off when it becomes greater than the corresponding element in c. This get's funny when you over-take an element and then the next c requires that element to be taken... Seems to suggest backtracking then, which implies solution is probably feasible with DP – Untitled123 Jan 01 '16 at 00:10
  • A greedy solution would work. That is, I can keep find cumsums of the weights. Then I iterate through the elements and look at when the cumsum crosses the corresponding capacity. I split the weights array at that index, then find the find cumsums of the remaing array again, and repeat the process until I've looked at all weights. But the problem with this is that I'm going through a while loop, and I have to perform the above operation on a 100000 elements a 1000 times. So it becomes really slow. If I can vectorize it, it would be the ideal solution. – Aditya369 Jan 01 '16 at 00:16
  • @Untitled123 should I update or modify my question according to the above greedy algorithm and just see if someone can help me vectorize the loop? – Aditya369 Jan 01 '16 at 00:17
  • Gonna have an answer to explain why greedy has an issue. – Untitled123 Jan 01 '16 at 01:58
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/99442/discussion-between-untitled123-and-aditya369). – Untitled123 Jan 01 '16 at 02:14

1 Answers1

1
w = [1,2,3,1,6,6]; c = [1,3,5, 1, 6, 12]

The only solution to this is

i=[2,3,4,5]

The greedy solution (to my understanding is to take until you cannot take)

It starts off with a 2 to get the [1,2] < =[1, 1+2] in c. However, if the next split is at 4 (as the greedy solution leads to, you get into issues since nothing can satisfy the 1). We should have instead split it at 2 and 3.

I suggested using backtracking to look back when this happens, but the running time could spiral out of control. The limit with 100k seems to suggest a linear solution or nlogn solution at worst. I have ideas of how to do this with dynamic programming, but still figuring out some specifics. Will update hopefully, or discard answer after a while. :)

Untitled123
  • 1,317
  • 7
  • 20