1

I have a python code, that reads certain arrays from a text file, does some (alot) of processing, then returns a 2D array : Intensity, which is plotted using imshow. Now my code is too slow due to the use of np.where at one point into a nested loop.

I have tried alot to use multiprocessing(and joblib) module. In both cases, I found that the code kept on running (forever) without any error. With some research (Jupyter notebook never finishes processing using multiprocessing (Python 3)) I found that multiprocessing has some problems on ipython and windows (gives Broken pipe error Broken pipe error with multiprocessing.Queue and https://github.com/spyder-ide/spyder/issues/7832). I tried the workarounds given, but nothing helped.


 xlin= np.linspace(-40,40,81)

 ylin= np.linspace(-40,40,81)

 xxlin,yylin=np.meshgrid(xlin,ylin)

 Intensity = np.zeros(np.shape(xxlin))

 for i in range(len(xlin)-1):

     for j in range(len(ylin)-1):

       k= np.where((Xnew >= xlin[i]) & (Xnew < xlin[i+1])& ( Ynew >= ylin[j]) & (Ynew < ylin[j+1]))

       N1= np.shape(k)[1]

       Intensity[j,i] = Intensity[j,i] + N1

  #Xnew and Ynew are 1D arrays read from data file. 

 fig = plt.figure(2,figsize=(14,14))

 plt.imshow(Intensity,origin='lower',interpolation='nearest',cmap = cm.jet)

I have no other issues except the speed of the code. After much toiling I realised that np.where is the real culprit. If someone can tell me an alternative suited to my case. I also wish if someone can help with parallelising this nested loop.

  • If a loop is unavoidable, try [`numba`](https://numba.pydata.org/). I don't have time to give a full answer, but there's one example at the end of [this answer](https://stackoverflow.com/a/52674448/9209546). *Only* if even this is too slow would I consider other options. – jpp Aug 18 '19 at 20:57
  • I can't help much with the `np.where` part of your question, but as a general rule of thumb, if you can write your operations in terms of numpy's matrix operations, [it will generally do its best to parallelize them for you](https://scipy-cookbook.readthedocs.io/items/ParallelProgramming.html). – rnorris Aug 18 '19 at 21:00
  • You may consider using [`numpy.histogram`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html#numpy.histogram) and [`numpy.histogram2d`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram2d.html) to accomplish your task rather than using explicit for loops. – GZ0 Aug 18 '19 at 21:18

0 Answers0