Improve performance python matrix

Question

I have a large matrix (shape: 2e6, 6) containing geophysical data. I have 3 for loops before I search for values in the matrix to assign my variables.

My first solution is with np.where. It's way too slow! I read it would be better to use another for loop to improve performance. However, the code I came up with is even slightly slower.

Does someone have an idea how to improve the performance, please?

First Solution (np.where)

for lat in LATS:
    for lon in LONS:
        for depth in range(1,401,1):

            node_point_line = matrix[np.where( (matrix[:,0]==lat) * (matrix[:,1]==lon) * (matrix[:,2]==depth) )][0]

            var1 = node_point_line[3]
            var2 = node_point_line[4]
            var3 = node_point_line[5]
            ...

Second Solution (extra for loop)

for lat in LATS:
    for lon in LONS:
        for depth in range(1,401,1):

            matrix_flat = matrix.flatten()
            for i in range( len( matrix_flat )):
                if matrix_flat[i]==lat and matrix_flat[i+1]==lon and matrix_flat[i+2]==depth:
                    var1 = matrix_flat[i+3]
                    var2 = matrix_flat[i+4]
                    var3 = matrix_flat[i+5]
                    ...

Again, both solutions are too slow. I avoid Fortran or C++ (I know it's faster). Any suggestions?

It's not so much that `where` is slow, but that you are doing it many times. You are iterating over 3 levels. What are you doing with `var1,var2,var3`? — hpaulj, Feb 16 '17 at 07:50
I use those variables to calculate an integral over depth, for each LAT/LON point. — Johngoldenboy, Feb 16 '17 at 07:52
@Johngoldenboy looks like there is an [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) here: you ask to optimize specific part of an algo (lookups) while trying to solve a bigger problem. I'd suggest opening a separate question (probably on [CS](http://cs.stackexchange.com/) website) asking for a faster algorithm of calculating integral. I think what you do is extremely suboptimal, but it's hard to reason as I don't know what exactly are you trying to do. — yeputons, Feb 16 '17 at 08:10
In a `numpy` we try avoid loops, especially multiple levels. We push the iterations onto fast compiled functions, and only iterate when that is impossible. — hpaulj, Feb 16 '17 at 08:28
may be the one of the approaches can be to use jit compilation by @numba (http://numba.pydata.org/numba-doc/0.17.0/user/jit.html). It can give acceleration for such type problems. — Roman Fursenko, Feb 16 '17 at 08:58

score 0 · Answer 1 · answered Feb 16 '17 at 07:46

0

How big are your LATS and LONS arrays? Even if they are one element long, then your program have to perform approximately 1*1*400*1e6*3 ~ 1.2e9 operations, which is way too much. Even when implemented properly in C++ it may take a second or more.

I think you should optimize your algorithm. It's not really clear what you're trying to do, but my guess is that you're trying to find any place in a matrix which has latitude in your LATS list, longitude in your LONS list and depth between 1 and 400. Exchange places of the loops: run along the matrix in the outer loop and then check that all three conditions hold. Afterwards, you'll be able to replace list of latitudes with set of latitudes, and set lookup is much more faster than list lookup.

answered Feb 16 '17 at 07:46

yeputons

8,478
34
67

I need to calculate an integral over depth, using var1 to var3. Both LAT and LON have a size of ~100, both are already a list(set(matrix[;,1])). You may be right to re-organise my for loops. – Johngoldenboy Feb 16 '17 at 07:59
If both arrays are 100 long, then amount of operations is enormous: `100*100*400*1e6*3 ~ 1e13`. It's hours, if not days, depending on hardware and optimizations. – yeputons Feb 16 '17 at 08:09
Exactly. That's why I need to optimize it. – Johngoldenboy Feb 16 '17 at 08:11

Paul Panzer · Answer 2 · 2017-02-16T10:29:51.743

Given the numbers it should be reasonably efficient to just lexsort your data array. That's 2 seconds invested in the beginning removing all the where statements.

The code snippet assumes that LATS and LONS are sorted:

order = np.lexsort([matrix.T][2::-1] # takes < 2 sec
sorted = matrix[order, :]

data_per_lat = np.split(sorted, sorted[:, 0].searchsorted(LATS[1:], axis=0)
for lat, dpla in zip(LATS, data_per_lat):
    data_per_lon = np.split(dpla, dpla[:, 1].searchsorted[LONS[1:], axis=0)
    for lon, dplo in zip(LONS, data_per_lon):
        depth, var1, var2, var3 = dplo.T[2:]
        # the variables are now vectors, containing all data matching lon and lat sorted by depth

score 0 · Answer 3 · answered Feb 16 '17 at 12:57

Well you can compress your starting matrix with a vectorized process, which will make your looping faster.

DPTH=np.arange(1,401,1)
mask=np.in1d(matrix[:,0],LATS) * np.in1d(matrix[:,1],LONS) * np.in1d(matrix[:,2],DPTH)
matrix_masked=matrix[mask]

Then just for loop over matrix_masked either by nditer (complicated) or by

for i in range(matrix_masked.shape[0]):
    var1 = matrix_masked[i,3]
   . . .

Improve performance python matrix

3 Answers3