0

I am working on a signal processing related problem. I have a dataset of >2000 EEG signals. Each EEG Signal is represented by a 2D Numpy array (19 x 30000). Each row of the array is one of the channels of the signal. What I have to do is to find the spectrograms on these individual channels (rows) and concatenate them vertically. Here is the code I wrote so far.

raw = np.load('class_1_ar/'+filename)

images = []

for i in range(19):    
    print(i,end=" ")
    spec,freq,t,im = plt.specgram(raw[i],Fs=100,NFFT=100,noverlap=50)
    plt.axis('off')
    figure = plt.gcf()
    figure.set_size_inches(12, 1)

    figure.canvas.draw()
    img = np.array(figure.canvas.buffer_rgba())
    img = cv2.cvtColor(img, cv2.COLOR_RGBA2BGRA)

    b = figure.axes[0].get_window_extent()
    img = np.array(figure.canvas.buffer_rgba())
    img = img[int(b.y0):int(b.y1),int(b.x0):int(b.x1),:]
    img = cv2.cvtColor(img, cv2.COLOR_RGBA2BGRA)
    
    images.append(img)

base = cv2.vconcat(images)
cv2.imwrite('class_1_sp/'+filename[:-4]+'.png',base)


c -= 1
print(c)

And here is my output:

enter image description here

However, the process is taking too much time to process. It took almost 8 hours for the first 200 samples to process.

My question is, What can I do to make it faster?

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
Michio Kaku
  • 49
  • 1
  • 6
  • Did you profile your code to see which part takes the most time? It looks like you plot a spectrogram and then grab the image from the screen? That seems a rather roundabout way of doing things... – Cris Luengo Apr 03 '21 at 05:58
  • Assuming the generation of each EEG is independent of the other, you can parallelize your code using Python multiprocessing. Moreover, as pointed out by Cris Luengo, there is probably a faster way to store figures into Numpy arrays. Moreover, note that matplotlib is generally quite slow. There is probably faster package to do that (please give a look to scipy). – Jérôme Richard Apr 03 '21 at 11:21
  • Thanks. Parallelizing it worked. Now it's way faster – Michio Kaku Apr 03 '21 at 18:26

1 Answers1

2

Like others have said, the overhead of going through matplotlib is likely slowing things down. It would be better to just compute (and not plot) the spectrogram with scipy.signal.spectrogram. This function directly returns the spectrogram as a 2D numpy array, so that you don't have the roundabout step of getting it out of the canvas. Note, that does mean you'll have to map the spectrogram output yourself to pixel intensities. In doing that, beware scipy.signal.spectrogram returns the spectrogram as powers, not decibels, so you probably want to do 10*np.log10(Sxx) to the result (see also scipy.signal.spectrogram compared to matplotlib.pyplot.specgram).

Plotting aside, the bottleneck operation in computing a spectrogram are the FFTs. Instead of using a transform size of 100 samples, 128 or some other power of 2 is more efficient. With scipy.signal.spectrogram this is done by setting nfft=128. Note, you can set nperseg=100 and nfft=128 so that 100 samples are still used for each segment, but zero-padded to 128 before doing the FFT. One other thought: if raw is 64-bit float, it may help to cast it to 32-bit: raw = np.load(...).astype(np.float32).

Pascal Getreuer
  • 2,906
  • 1
  • 5
  • 14