The problem statement is simple: given an arbitrary amount of NumPy one-dimensional vectors of floats, as such:
v1 = numpy.array([0, 0, 0.5, 0.5, 1, 1, 1, 1, 0, 0])
v2 = numpy.array([4, 4, 4, 5, 5, 0, 0])
v3 = numpy.array([1.1, 1.1, 1.2])
v4 = numpy.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10])
What is the fastest way to sum them?
many_vectors = [v1, v2, v3, v4]
Using a direct sum function will not work because they can be of arbitrary uneven length:
>>> result = sum(many_vectors)
ValueError: operands could not be broadcast together with shapes (10,) (7,)
Instead, one can have recourse to the pandas
library which will offer a simple fillna
parameter to avoid this problem.
>>> pandas.DataFrame(v for v in many_vectors).fillna(0.0).sum().values
array([ 5.1, 5.1, 5.7, 5.5, 6. , 1. , 1. , 1. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 10. ])
But this is probably not the most optimized way of proceeding as production use cases will have much larger amounts of data.
In [9]: %timeit pandas.DataFrame(v for v in many_vectors).fillna(0.0).sum().values
1.16 ms ± 97.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)