0

I am reading through a Numpy Tutorial and it is saying that a sample code like this:

>>> X = np.ones(10, dtype=np.int)
>>> Y = np.ones(10, dtype=np.int)
>>> A = 2*X + 2*Y

is slow because it creates three different intermediate arrays in order to hold the values of A, 2*X, and 2*Y.

Instead it is suggested that if speed is an issue perform the same calculation like this:

>>> X = np.ones(10, dtype=np.int)
>>> Y = np.ones(10, dtype=np.int)
>>> np.multiply(X, 2, out=X)
>>> np.multiply(Y, 2, out=Y)
>>> np.add(X, Y, out=X)

Yet I don't see where the speed difference would be. In the second code, X and Y still appear to be created as intermediate arrays. Is the difference rather in the speed of np.multiply instead of 2*X?

The Nightman
  • 5,609
  • 13
  • 41
  • 74
  • "In the second code, X and Y still appear to be created as intermediate arrays" - what? No, they're the inputs. The code reuses them to hold intermediate results (trashing the original data) rather than allocating new arrays. – user2357112 Apr 11 '17 at 20:25
  • 1
    See http://stackoverflow.com/questions/27293830/utility-of-parameter-out-in-numpy-functions about using `out` – Thierry Lathuille Apr 11 '17 at 20:28
  • @ThierryLathuille Gotcha, this makes sense now why the second code snippet can be useful. – The Nightman Apr 11 '17 at 20:30

2 Answers2

2

Those two code examples aren't equal in terms of what they are doing. You're having to allocate a new array when you do 2*X. The second example, while faster, is a bit destructive, since you need to directly modify the array instead of making a copy for a calculation.

If you plan on reusing X and Y for several operations that do not depend on each other (that is, you multiply X and Y for this operation, but not a future one) you may want to use your initial approach so you don't have to undo operations.

Jessie
  • 2,319
  • 1
  • 17
  • 32
2

I wrapped the two examples in functions, and tried some timings:

In [136]: timeit foo1(1000)
10000 loops, best of 3: 26.4 µs per loop
In [137]: timeit foo2(1000)
10000 loops, best of 3: 27.4 µs per loop

In [138]: timeit foo1(100000)
100 loops, best of 3: 2.39 ms per loop
In [139]: timeit foo2(100000)

1000 loops, best of 3: 1.24 ms per loop
In [140]: timeit foo1(10000000)
^[[A^[1 loop, best of 3: 571 ms per loop
In [141]: timeit foo2(10000000)
10 loops, best of 3: 175 ms per loop

For the smaller size, the use of outs doesn't make much difference. It's when the arrays get into the 10000 & up element size that we see benefits in array reuse. I suspect that with larger arrays the relative cost of allocating new ones is greater - harder to find reusable blocks, which then requires more calls to the OS, etc.

And the time savings are lost if I have to make copies of the 2 initial arrays first (to allow for their reuse)

 X = np.ones(N).copy()
 Y = np.ones(N).copy()

This is the kind of change to consider when you've gotten rid of the iterations. Even then SO answers are more likely to suggest numba or cython. I see it some in numpy functions, but it doesn't standout. The exception that comes to mind is np.cross (np.source(np.cross) ) which uses blocks like this:

        # cp0 = a1 * b2 - 0  (a2 = 0)
        # cp1 = 0 - a0 * b2  (a2 = 0)
        # cp2 = a0 * b1 - a1 * b0
        multiply(a1, b2, out=cp0)
        multiply(a0, b2, out=cp1)
        negative(cp1, out=cp1)
        multiply(a0, b1, out=cp2)
        cp2 -= a1 * b0
hpaulj
  • 221,503
  • 14
  • 230
  • 353