Bypassing meshgrid works faster when evaluating function with numpy and list comprehension

Question

In this thread I found a way to bypass meshgrid when using simple numpy equations, by usethedeathstar: numpy - evaluate function on a grid of points

I had a similar problem but using list comprehension in the equation and tried giving it a shot, didn't think it would work:

import numpy as np
import math
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def rastrigin(dim, x, A=10):
    return  dim + np.sum([(xi**2 - A * np.cos(2 * math.pi * xi)) for xi in x])

def main():
    x = np.linspace(-4, 4, 100)    
    y = np.linspace(-4, 4, 100)

    # Option 1 - bypass meshgrid - MAGIC!
    #https://stackoverflow.com/questions/22774726/numpy-evaluate-function-on-a-grid-of-points/22778484#22778484
    Z = rastrigin(2, [x[:,None], y[None,:]])

    # Option 2 - traditional way using meshgrid
    X,Y = np.meshgrid(x,y)
    Z = np.array( [rastrigin(2, [x,y]) for x,y in zip(np.ravel(X), np.ravel(Y))] ).reshape(X.shape) 
    
    # timeit shows Option 1 is ridiculously faster than Option 2
    import timeit
    t1 = timeit.timeit(lambda: np.array( [rastrigin(2, [x,y]) for x,y in zip(np.ravel(X), np.ravel(Y))] ).reshape(X.shape) , number=100)
    t2 = timeit.timeit(lambda: rastrigin(2, [x[:,None], y[None,:]]), number=100)
    print(t1, t2)

    fig = plt.figure()
    ax = fig.gca(projection='3d')

    ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=plt.cm.plasma, linewidth=0, antialiased=False)   
    plt.show()


if __name__ == "__main__":
    main()

Not only this also works with list comprehension, something I think even the original author didn't intend, but is blazing fast. Option 1 runs in 0.003 seconds with timeit, Option 2 in 8.7 seconds.

My question is: how? I can't see why this approach works with list comprehension.

I understand this generates two arrays, one (100,1) and another (1,100): [(xi**2 - A * np.cos(2 * math.pi * xi)) for xi in x] . Then numpy.sum is propagating the sums and generating a (100,100) result? Is this expected behavior?

Your `option2` is not the traditional way of using `meshgrid`. Your `X` and `Y` are 2d array, which can usually be used in the same way as `x[:,None]` and `y[:,None]`. In fact `meshgrid` with `sparse=True` produces those (n,1) and (1,n) arrays. What's slow is raveling the arrays and passing scalar pairs to your function. One calls `rastrigin` 10000 times, the other just once. — hpaulj, Apr 08 '21 at 00:52
@hpaulj this small code is only an example. I used Rastrigin for this proof-of-concept since it's a well known benchmark function and fits my use case. The original function I'm working with is an optimization problem, can not be easily rewritten and is full of list comprehensions on input variables. If you can demonstrate an approach that works as well as Option 1 but using meshgrid I'd be happy to vote it as an answer. — nihilscientia, Apr 08 '21 at 01:31
I looked more carefully at `rastrigin`. It's use of `np.sum` is causing problems when given a list of arrays instead of scalars. — hpaulj, Apr 08 '21 at 15:30

hpaulj · Answer 1 · 2021-04-08T07:20:02.333

The talk in numpy - evaluate function on a grid of points about bypassing meshgrid is a bit misleading.

Consider evaluating the demo function on a (1000,1000) grid of values:

Using meshgrid to make a pair of (1000,1000) arrays:

In [136]: X,Y = np.meshgrid(np.linspace(-4,4,1000),np.linspace(-4,4,1000))
In [137]: timeit np.sin(X*Y)
48.1 ms ± 32.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Now do the same with the sparse setting:

In [138]: X,Y = np.meshgrid(np.linspace(-4,4,1000),np.linspace(-4,4,1000), sparse=True)
In [139]: timeit np.sin(X*Y)
47.4 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [140]: X.shape
Out[140]: (1, 1000)

This X is the same as doing np.linspace(-4,4,1000)[None,:].

The time difference is negligible. The 'sparse' arrays use less memory, but as soon as we do X*Y the result is (1000,1000), and the sin is evaluated on the same total number of points. I do prefer using

np.sin(x[:,None], y[None,:])

style, but the speed advantage comes from doing as much in compiled numpy methods, not from "bypassing meshgrid". The slow thing is to do the sin(x*y) calculation 1000*1000 times on a pair of scalars:

With a double loop:

In [144]: np.array([[np.sin(x*y) for x in np.linspace(-4,4,1000)] for y in np.linspace(-4,4,1000)]).shape
Out[144]: (1000, 1000)
In [145]: timeit np.array([[np.sin(x*y) for x in np.linspace(-4,4,1000)] for y in np.linspace(-4,4,1000)])
3.26 s ± 5.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Or with raveled meshgrid arrays:

In [146]: X,Y = np.meshgrid(np.linspace(-4,4,1000),np.linspace(-4,4,1000))
In [147]: timeit np.array([np.sin(x*y) for x,y in zip(X.ravel(), Y.ravel())]).reshape(X.shape)
3.35 s ± 5.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I realize your rastrigin function is more complicated than is test, but if you want to use numpy to its fullest, you need to avoid these python level iterations.

If you must iterate, it's often faster to work from lists (and math functions) instead of numpy:

In [148]: import math
In [149]: timeit np.array([math.sin(x*y) for x,y in zip(X.ravel().tolist(), Y.ravel().tolist())]).
    reshape(X.shape)
435 ms ± 3.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Actually in this case using math.sin for the scalar calculation is the big time saver.

One of the other answers in the link suggested using fromiter and map. The time savings are modest:

In [153]: timeit np.fromiter(map(lambda x,y:math.sin(x*y), X.ravel().tolist(), Y.ravel().tolist()),float).
     ...: reshape(X.shape)
377 ms ± 7.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

hpaulj · Answer 2 · 2021-04-08T15:28:09.583

Now let's look at your rastrigin.

In [154]: def rastrigin(dim, x, A=10):
     ...:     return  dim + np.sum([(xi**2 - A * np.cos(2 * math.pi * xi)) for xi in x])

Calling it with 2 scalars:

In [155]: rastrigin(2, [1.2, 3.4])
Out[155]: 19.99999999999999

Now make 2 arrays - and for clarity I'll make them different:

In [156]: x = np.array([1.2, 1.3, 1.4])
In [157]: y = np.array([3.4, 3.5])

A double loop solution to produce a (3,2) result. the [0,0] term matches [155]:

In [158]: np.array([[rastrigin(2,[i,j]) for j in y] for i in x])
Out[158]: 
array([[20.        , 22.59983006],
       [26.43033989, 29.03016994],
       [31.70033989, 34.30016994]])

And to take your meshgrid approach

In [159]: X,Y = np.meshgrid(x,y, indexing='ij')   # NOTE indexing
In [160]: X
Out[160]: 
array([[1.2, 1.2],
       [1.3, 1.3],
       [1.4, 1.4]])
In [161]: Y
Out[161]: 
array([[3.4, 3.5],
       [3.4, 3.5],
       [3.4, 3.5]])

These 2 arrays are (3,2) shape.

In [162]: Z = np.array( [rastrigin(2, [x,y]) for x,y in zip(np.ravel(X), np.ravel(Y))] ).reshape(X.shape)
In [163]: Z
Out[163]: 
array([[20.        , 22.59983006],
       [26.43033989, 29.03016994],
       [31.70033989, 34.30016994]])

I showed in the previous answer that these two approaches time about the same.

In [164]: Z = rastrigin(2, [x[:,None], y[None,:]])
/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py:87: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
In [165]: Z
Out[165]: 
array([[20.        , 22.59983006],
       [26.43033989, 29.03016994],
       [31.70033989, 34.30016994]])

But with the 2d array arguments we get just one value:

In [166]: Z = rastrigin(2, [X, Y])
In [167]: Z
Out[167]: 154.06084971874733

When I first looked at rastrigin I wondered why you were iterating on x, thinking x was an array. But this x is actually a list of 2 values, the [x,y] in the calling expressions. So for just 2 variables it could written as:

def rastrigin1(dim, x,y, A=10):
    return dim + (x**2 - A * np.cos(2 * np.pi * x)) +\
                 (y**2 - A * np.cos(2 * np.pi * y))

This can be called with scalars, meshgrid arrays, and broadcasted ones (and of course either of the iterations):

In [181]: rastrigin1(2, 1.2, 3.4)
Out[181]: 19.99999999999999
In [182]: rastrigin1(2, X, Y)
Out[182]: 
array([[20.        , 22.59983006],
       [26.43033989, 29.03016994],
       [31.70033989, 34.30016994]])
In [183]: rastrigin1(2, x[:,None], y[None,:])
Out[183]: 
array([[20.        , 22.59983006],
       [26.43033989, 29.03016994],
       [31.70033989, 34.30016994]])

Your rastrigin had problems with the array inputs because the np.sum step has to first make an array out list comprehension.

In [186]: np.array([x[:,None], y[None,:]])
<ipython-input-186-ae2f6203b4cd>:1: VisibleDeprecationWarning: ...
Out[186]: 
array([array([[1.2],
              [1.3],
              [1.4]]), array([[3.4, 3.5]])], dtype=object)
In [187]: np.array([X,Y])
Out[187]: 
array([[[1.2, 1.2],
        [1.3, 1.3],
        [1.4, 1.4]],

       [[3.4, 3.5],
        [3.4, 3.5],
        [3.4, 3.5]]])

[186] gets the ragged array warning because the inputs a (3,1) and (1,2), not compatible shapes. [187] makes a 3d array from 2 identically shaped arrays.

sum applied to [186] does the 'outer' broadcasted sum producing a (3,2) result. sum` applied to the (2,3,2) [187] flattens it and produces one value.

np.sum([X,Y], axis=0) will do the same as X+Y.

This version of rastrigin (untested) should give us the best of both worlds, accepting a list of arrays (not just 2), and getting the sum right.

def rastrigin(dim, alist, A=10):
    res = dim
    for xi in alist:
        res += (xi**2 - A * np.cos(2 * np.pi * xi))
    return res

Bypassing meshgrid works faster when evaluating function with numpy and list comprehension

2 Answers2