Efficient way to loop over 2D array

Question

I have a 2D array 2000x4000 and, for each cell in the array, I have to compare the value of the cell versus the standard deviation of a mask made by the 10 neighboring cells (in +/- X and +/- Y).

For example this is what I'm doing right now:

import numpy as np
from astropy.stats import sigma_clipped_stats

BPmap=[]
N=10
a=np.random.random((2000,4000))
for row in range(N,a.shape[0]-N):
    BPmap_row=[]
    for column in range(N,a.shape[1]-N):
        Bpmap_data=np.array(a[row-N:row+N,column-N:column+N].ravel()) 
        mean, median, std = sigma_clipped_stats(Bpmap_data, sigma=3,iters=5)
        BPmap_Nsigma=float(a[row][column]-median)/std                 
        BPmap_row.append(BPmap_Nsigma)
    BPmap.append(BPmap_row)

This has the obvious problem that I'm making 2000x4000=8000000 loops and it's taking very long. What I need is to find a very efficient way to perform these operations but I have no idea how.

Implicitly you are squaring each number in the array many, many times. Perhaps you could just square each number once (keeping an auxiliary matrix of squares) and move through both the original matrix and the matrix of squares in sliding windows of the appropriate size, gathering the appropriate sums. — John Coleman, Jun 12 '19 at 13:47
@JohnColeman not sure I understand what you mean. There is any very quick way to put all this little tiles 21x21 (+/-10 in each direction plus the central pixel I want the value from) in a list and then perform just once a robust standard deviation over all of them at the same time — Giovanni Maria Strampelli, Jun 12 '19 at 13:59
[This answer](https://stackoverflow.com/a/18422519/4996248) (as well as [this answer](https://stackoverflow.com/a/18423835/4996248) to the same question) is the sort of thing that I had in mind. [This question](https://stackoverflow.com/q/32660953/4996248) also seems relevant. — John Coleman, Jun 12 '19 at 14:06

politinsa · Answer 1 · 2019-06-19T14:24:56.567

We generally avoid using double-for loop with numpy; it's slow and with smart indexing (array[:, i-N]...) we can do a lot in a single loop.
But for your convolution problem, it might be the easiest ~~(and only ?)~~ way to do what you want. (edit: it's not. See below @Masoud's answer).

Bpmap_data=np.array(a[row-N:row+N,column-N:column+N].ravel()) is creating a new array at each loop.
Computing median and std without creating a new array (ie using the numpy view directly) would be much faster.

In fact, it's 40 times faster (on my google colab instance)

To get a 400 times faster algo, please look at @Masoud answer that is using scipy filter for 2D-array.

import numpy as np
from astropy.stats import sigma_clipped_stats


N=10
a=np.random.random((80,40))


def f():
  """Your original code"""
  BPmap=[]
  for row in range(N,a.shape[0]-N):
      BPmap_row=[]
      for column in range(N,a.shape[1]-N):
          Bpmap_data=np.array(a[row-N:row+N,column-N:column+N].ravel()) 
          mean, median, std = sigma_clipped_stats(Bpmap_data, sigma=3,iters=5)
          BPmap_Nsigma=float(a[row][column]-median)/std                 
          BPmap_row.append(BPmap_Nsigma)
      BPmap.append(BPmap_row)
  return BPmap

def f2():
  """this little guy is improving a lot your work"""
  BPmap=[]
  for row in range(N,a.shape[0]-N):
      BPmap_row=[]
      for column in range(N,a.shape[1]-N):
          # the next 3 lines do not need any more memory
          view = a[row-N:row+N,column-N:column+N]
          std_without_outliers = view[view - view.mean() < 3*view.std()].std()
          median = np.median(view)
          # back to your workflow
          BPmap_Nsigma=float(a[row][column]-median)/std_without_outliers                 
          BPmap_row.append(BPmap_Nsigma)
      BPmap.append(BPmap_row)
  return BPmap

%time _ = f()
%time _ = f2()

f() == f2()

>>>CPU times: user 39.7 s, sys: 14.2 ms, total: 39.7 s
Wall time: 39.7 s
CPU times: user 969 ms, sys: 2.99 ms, total: 972 ms
Wall time: 976 ms
True

Edit
In fact, sigma_clipped_stats(a[row-N:row+N,column-N:column+N]) does slow down the loop. I suspect sigma_clipped_stats of creating a copy of its argument.

im taking the std after getting rid of outliers from a 3 sigma cut

I'm showing here a way to do this with pure numpy; this is really faster than the function you used before.

In the end, f() = f2() so why using this astropy function anymore ?

Is it? in f() im taking the std after getting rid of outliers from a 3 sigma cut, in f2 I'm just taking the std (unless I'm miss-understanding the scope of sigma_clipped_stats). — Giovanni Maria Strampelli, Jun 12 '19 at 14:24
@GiovanniMariaStrampelli see my edit; my code does now cut outliers, but with a normal distribution there are very few outliers, it's why `f() == f2()`was true — politinsa, Jun 12 '19 at 14:37
Thanks this looks very promising, I'll test it with my actual data (that's it's a bit more complicated than this - this is why I need the 3-sigma cut) and I'll let you know. — Giovanni Maria Strampelli, Jun 12 '19 at 15:04
You might use a progressbar to monitor the status of the computing. — politinsa, Jun 12 '19 at 15:25

Masoud · Answer 2 · 2019-06-13T12:32:19.410

7

There are some issues with the code that decrease the performance:

As described here, avoid using for-loops.
You are actually squaring each number 10*10 times.

Instead of for-loop, you can use Scipy.ndimage and opencv librarians to perform convolution. Whereas these libraries are used for image processing, they are so efficient for processing any 2D array. Here is a code that perform the same task as required using Scipy.ndimage tools, but 1000X faster (23ms vs 27s for a 200X400 array). I used an algorithm provided here to calculate standard deviation:

import numpy as np
from scipy.ndimage.filters import uniform_filter, median_filter

a=np.random.random((200,400))
c1 = uniform_filter(a, size = (10,10))
c2 = uniform_filter(a*a, size = (10,10))
std = ((c2 - c1*c1)**.5)
med = median_filter(a, size=(10, 10))

BPmap = (a - med)/std

edited Jun 13 '19 at 12:32

answered Jun 12 '19 at 21:52

Masoud

1,270
7
12

2

This answer is clearly the best, I'll mention it in mine. – politinsa Jun 13 '19 at 07:59
1

Excellent answer. OP is actually squaring most numbers 21*21 = 441 times, which accounts for much of your 1000X speed-up. – John Coleman Jun 13 '19 at 12:23
Indeed, squaring 0.19336719 for 441*200*400 times takes 5s on my machine. I suppose other part comes from numpy vectorization. – Masoud Jun 13 '19 at 12:48

Efficient way to loop over 2D array

2 Answers2

In fact, it's 40 times faster (on my google colab instance)