Performance improvement in numpy array transformation

Question

Given three numpy 1D arrays, I want to transform them as follows:

import numpy as np

Xd = np.asarray([0, 0,   1,   1,   0.5])
Yd = np.asarray([0, 0,   0,   2.5, 2.5])
Zd = np.asarray([0, 1.5, 1.5, 1.5, 1.5])

points = np.stack([Xd, Yd, Zd], axis=1).reshape(-1, 1, 3)
segments = np.concatenate([points[:-1], points[1:]], axis = 1)    

print(segments.shape)
print(segments)

Output:

(4, 2, 3)
[[[0.  0.  0. ]
  [0.  0.  1.5]]

 [[0.  0.  1.5]
  [1.  0.  1.5]]

 [[1.  0.  1.5]
  [1.  2.5 1.5]]

 [[1.  2.5 1.5]
  [0.5 2.5 1.5]]]

Is there a way to improve the performance of this transformation?

Background

This transformation is necessary to use the XYZ coordinates in matplotlib with Line3DCollection. So far, I have only seen variations of the above code but with thousands of coordinates or interpolated data for better resolution, an optimized approach is necessary.

Summary

Thanks to @Mercury, it can be concluded that for shorter arrays (<1k in length) the answer by @Miguel performs better but the approach by @mathfux scales way better when the arrays get longer.

Miguel · Answer 1 · 2020-10-31T14:59:28.663

4

As a general advice, when you want speed, you should generally try to avoid stack and concatenate, as it often means copying the same data around multiples times.

Anyway, here is how I'd do it, slightly longer code but doesn't do more work than needed

n = len(Xd)
segments = np.empty((n-1, 2, 3))

segments[:,0,0] = Xd[:-1]
segments[:,1,0] = Xd[1:]

segments[:,0,1] = Yd[:-1]
segments[:,1,1] = Yd[1:]

segments[:,0,2] = Zd[:-1]
segments[:,1,2] = Zd[1:]

[EDIT] - The following was made for science/ fun, do not reproduce

So I tryed to see if I could squeeze a little bit more performance from @mathfux's answer, and I came out with this ugly code:

a = np.empty(3*n)
a[:n]    = Xd
a[n:n+n] = Yd
a[n+n:]  = Zd

interface = dict(a.__array_interface__)
interface['shape'] = (n-1, 2, 3)
interface['strides'] = (a.itemsize, a.itemsize, n*a.itemsize)
segments= np.array(np.lib.stride_tricks.DummyArray(interface, base=a), copy=False)

On my machine, it is measurably faster (up to ~30% depending on the size of the input). The gains are partly due to the construction of a and skipping the checks of as_strided

edited Oct 31 '20 at 14:59

answered Oct 31 '20 at 12:22

Miguel

692
5
14

Oh, that's pretty big, so I guess this approach isn't that interesting. – Miguel Oct 31 '20 at 13:04
I added a variation on mathfux's answer if you want to have a look, although I wouldn't recommend using it. That being said, I think that the construction of the transpose in his answer could be replaced by something a little bit faster – Miguel Oct 31 '20 at 15:05
I prefer your early works. This becomes unreadable, at least to me. – Mr. T Oct 31 '20 at 15:46
I am really sorry that I will have to accept the other answer instead of yours. I like the readability aspect of your initial solution and that it performs so well for smaller arrays but the question is about performance for large-scale arrays. – Mr. T Oct 31 '20 at 17:46
Don't worry, I totally agree with you – Miguel Oct 31 '20 at 18:33

mathfux · Accepted Answer · 2020-10-31T13:31:40.207

3

It seems like you're trying to roll a window of shape (2, 3) in a 2D array. This is similar to convolution of image which can be done with np.lib.stride_tricks in a very efficient way.

a = np.transpose([Xd, Yd, Zd])
window = (2, 3)
view_shape = (len(a) - window[0] + 1,) + window # (4,2,3) if len(a) == 5
sub_matrix = np.lib.stride_tricks.as_strided(a, shape = view_shape, strides = (a.itemsize,) + a.strides)
>>> sub_matrix
array([[[0. , 0. , 0. ],
        [0. , 0. , 1.5]],

       [[0. , 0. , 1.5],
        [1. , 0. , 1.5]],

       [[1. , 0. , 1.5],
        [1. , 2.5, 1.5]],

       [[1. , 2.5, 1.5],
        [0.5, 2.5, 1.5]]])

Note that np.lib.stride_tricks is very performant against any alternative ways.

edited Oct 31 '20 at 13:31

answered Oct 31 '20 at 13:03

mathfux

5,759
1
14
34

1

This is really impressive! – Miguel Oct 31 '20 at 13:20
Indeed. This approach becomes better in comparison to the original and Miguel's approach, the longer the arrays `Xd, Yd, Zd` are. – Mr. T Oct 31 '20 at 13:29
1

Alright Mr.T. So I came back with a minor extension for arrays of any length desired ant timed it. It takes 25µs for samples of length 1000 and 15ms for samples of length 1M. – mathfux Oct 31 '20 at 13:29
Has anybody mentioned yet that this performance improvement is impressive? – Mr. T Oct 31 '20 at 13:45
@Mr.T I guess everyone should learn a lot from [@Divakar](https://stackoverflow.com/users/3293881/divakar). He mentions that stride tricks is a super tool in his profile. – mathfux Oct 31 '20 at 13:50

score 1 · Answer 3 · edited Oct 31 '20 at 14:52

Here are some timing tests, on larger arrays, which makes the difference clearer.

import numpy as np
from timeit import timeit

# original
def f1(x, y, z):
    points = np.stack([x, y, z], axis=1).reshape(-1, 1, 3)
    return np.concatenate([points[:-1], points[1:]], axis = 1)

# preallocating and then assigning
def f2(x, y, z):
    segments = np.empty((len(x)-1, 2, 3))

    segments[:,0,0] = x[:-1]
    segments[:,1,0] = x[1:]

    segments[:,0,1] = y[:-1]
    segments[:,1,1] = y[1:]

    segments[:,0,2] = z[:-1]
    segments[:,1,2] = z[1:]
    return segments

# stacking, but in one go
def f3(x, y, z):
    segments = np.stack([x[:-1], y[:-1], z[:-1], x[1:], y[1:],z[1:]], axis=1)
    return segments.reshape(-1, 2, 3)

# list comparison
def f4(x, y, z):
    z_ = [i for i in zip(x,y,z)]
    return [[[z_[i]],[z_[i+1]]] for i in range(len(z_)-1)]

#np.lib.stride_tricks approach
def f5(x, y, z):
    a = np.transpose([x, y, z])
    window = (2, 3)
    view_shape = (len(a) - window[0] + 1,) + window # (4,2,3) if len(a) == 5
    return np.lib.stride_tricks.as_strided(a, shape = view_shape, strides = (a.itemsize,) + a.strides)
    

ntime = 5000 #number of test runs
nxd = 500    #array length

Xd = np.random.randn(nxd)
Yd = np.random.randn(nxd)
Zd = np.random.randn(nxd)

print(timeit(lambda: f1(Xd, Yd, Zd), number=ntime))
#0.11369249999999999

print(timeit(lambda: f2(Xd, Yd, Zd), number=ntime))
#0.0480651

print(timeit(lambda: f3(Xd, Yd, Zd), number=ntime))
#0.10202380000000003

print(timeit(lambda: f4(Xd, Yd, Zd), number=ntime))
#1.8407391

print(timeit(lambda: f5(Xd, Yd, Zd), number=ntime))
#0.09132560000000023
    
ntime = 50     #number of test runs
nxd = 500000   #array length

Xd = np.random.randn(nxd)
Yd = np.random.randn(nxd)
Zd = np.random.randn(nxd)

print(timeit(lambda: f1(Xd, Yd, Zd), number=ntime))
#1.7519548999999999

print(timeit(lambda: f2(Xd, Yd, Zd), number=ntime))
#1.504727

print(timeit(lambda: f3(Xd, Yd, Zd), number=ntime))
#1.5010566

print(timeit(lambda: f4(Xd, Yd, Zd), number=ntime))
#22.6208157

print(timeit(lambda: f5(Xd, Yd, Zd), number=ntime))
#0.46465339999999955

As you can see, @Miguel's way is the way to go: preallocating the array and then assigning is the most efficient way to do this. Even if you stack them in a smarter manner like in f3(), it's still slower than f2(). But nothing beats f5() when the array length increases substantially.

Do you mind that I include in your post the `np.lib.stride_tricks` approach and also different array lengths? — Mr. T, Oct 31 '20 at 13:34

Equinox · Answer 4 · 2020-10-31T12:43:55.690

0

I found this to be faster than @Miguel's code.

z = [i for i in zip(Xd,Yd,Zd)]
segments = [[[z[i]],[z[i+1]]] for i in range(len(z)-1)]

edited Oct 31 '20 at 12:43

answered Oct 31 '20 at 12:31

Equinox

6,483
3
23
32

1

Highly unlikely. At a glance I can tell this should scale very poorly, and is actually worse than OP's original code. Try some %timeit tests with Xd, Yd, Zd as longer 1d arrays, perhaps with length 500. – Mercury Oct 31 '20 at 12:38
@Mercury Ah.. I was doing the test wrong way for some reason i thought length of x,y,z going to remain same. my bad. added my test as reference. – Equinox Oct 31 '20 at 12:47

Performance improvement in numpy array transformation

4 Answers4

Linked