1

I'm iterating over some input batches and generating results that have shape (BatchSize, X, Y). The BatchSize is not necessarily the same as I loop over the batches. I'd like to return a single output which is the concatenated version of the results along the batch dimension. What's the most elegant way to do this in NumPy?

I'm not so much worried about the performance but rather dealing with the multi-dimensionality of the accumulated result array.

feedMe
  • 3,431
  • 2
  • 36
  • 61
Milad
  • 4,901
  • 5
  • 32
  • 43
  • Possible duplicate of [Fastest way to grow a numpy numeric array](https://stackoverflow.com/questions/7133885/fastest-way-to-grow-a-numpy-numeric-array) – tel Jan 21 '19 at 14:15
  • Oh lord, not this question again. Since you can't correctly preallocate, the best approach is to accumulate the subarrays in a plain Python list and then concatenate all of the subarrays together at the very end. There's a bunch of QA threads on SO with potential optimizations, but the `cat` method I described tends to be nearly as good as any of them. – tel Jan 21 '19 at 14:16

2 Answers2

1

Assuming that you have enough memory to hold all of the results, a good solution is to simply pre-allocate the memory:

result = np.empty(OUTPUT_SHAPE)
i=0
while i < input_tensor.shape[0]:
    batch_size = get_batch_size(i)
    result[i:i+batch_size] = deal_with_batch(input_tensor[i:i+batch_size])
    i += batch_size
Him
  • 5,257
  • 3
  • 26
  • 83
0

The answer by @Scott is correct. I was however looking for the incremental version which I think I've found:

Define results = np.empty((0, output_shape)) and then update it in the loop using results = np.concatenate((results, some_func(x)))

I'm not sure how I should think about a dimension of size 0 in numpy but it works.

Milad
  • 4,901
  • 5
  • 32
  • 43
  • 1
    Beware this solution, as it has quadratic runtime. A better way would be to `.append` each `result` to a `list`, then to concatenate them ONCE at the end. – Him Jan 21 '19 at 14:52
  • To elaborate, the `np.concatenate` call has to allocate an entirely new array and perform a memcpy of the current `results` matrix. This will happen for each batch, and the `results` matrix grows each time. Thus, we first memcpy 1 batch, then 2 batches, then 3 etc. `sum(range[n])` ~ n^2. i.e., as the number of batches increases, this solution will scale very poorly. – Him Jan 21 '19 at 14:55
  • 1
    Collecting the arrays in a list, and doing one `concatenate` at the end is the usual recommendation, based on performance. The incremental `concatenate` has two main problems - creating the starting array isn't as simple as `alist=[]` (as you found out), and speed (it creates a new array each time with full copy). List `append` just adds a pointer to a list, not a whole new array. – hpaulj Jan 21 '19 at 17:34