Fastest way to get average value of frequencies within range

Question

I am new in python as well as in signal processing. I am trying to calculate mean value among some frequency range of a signal.

What I am trying to do is as follows:

import numpy as np
data = <my 1d signal>
lF = <lower frequency>
uF = <upper frequency>
ps = np.abs(np.fft.fft(data)) ** 2 #array of power spectrum

time_step = 1.0 / 2000.0

freqs = np.fft.fftfreq(data.size, time_step) # array of frequencies
idx = np.argsort(freqs) # sorting frequencies

sum = 0
c =0
for i in idx:
    if (freqs[i] >= lF) and (freqs[i] <= uF) :
        sum += ps[i]
        c +=1
avgValue = sum/c
print 'mean value is=',avgValue

I think calculation is fine, but it takes a lot of time like for data of more than 15GB and processing time grows exponentially. Is there any fastest way available such that I would be able to get mean value of power spectrum within some frequency range in fastest manner. Thanks in advance.

EDIT 1

I followed this code for calculation of power spectrum.

EDIT 2

This doesn't answer to my question as it calculates mean over the whole array/list but I want mean over part of the array.

EDIT 3

Solution by jez of using mask reduces time. Actually I have more than 10 channels of 1D signal and I want to treat them in a same manner i.e. average frequencies in a range of each channel separately. I think python loops are slow. Is there any alternate for that? Like this:

for i in xrange(0,15):
        data = signals[:, i]
        ps = np.abs(np.fft.fft(data)) ** 2
        freqs = np.fft.fftfreq(data.size, time_step)
        mask = np.logical_and(freqs >= lF, freqs <= uF )
        avgValue = ps[mask].mean()
        print 'mean value is=',avgValue

You probably want to use `and` instead of `&` in your `if` directive. — ForceBru, Dec 19 '15 at 17:14
One does bitwise-and while the other does boolean-and. No, doesn't matter here, but that's only because that is an implementation detail of Python. Using `and` there is preferred for clarity. — OneCricketeer, Dec 19 '15 at 17:24
@cricket_007 and ForceBru thanks I changed it. I am seeking for pythonic way to access frequency range instead of looping among them. — Muaz, Dec 19 '15 at 17:27
Possible duplicate of [Calculating arithmetic mean (average) in Python](http://stackoverflow.com/questions/7716331/calculating-arithmetic-mean-average-in-python) — OneCricketeer, Dec 19 '15 at 17:29
I see your update. I fail to see how doing anything different from what you have will be "faster". I can provide a Pythonic answer, but it will do the same thing you have. The only recommendation I can give for speed is that you don't need to sort the list before finding the average. You can do `for freq in freqs:` and operate on the `freq` value instead of `freqs[i]` — OneCricketeer, Dec 19 '15 at 17:40
@Muaz, numpy is very relevant to your question so I added the tag, you will want to avoid slow python loops — Padraic Cunningham, Dec 19 '15 at 17:43
You seem to be computing a nonsense value that is merely the mean of quantized UF and LF, which can be done in one line of code. — hotpaw2, Dec 19 '15 at 17:59
@hotpaw2 Can you please explain how exactly? I am really stuck. — Muaz, Dec 19 '15 at 18:04
Concerning *Edit 3*: Since your signals have the same length, calculate the mask outside the loop. Furthermore, use `rfft(data, n=n_pow2)` instead of `fft(data)` and multiply your resulting mean by 2, where ``n_pow2=2**(ceil(log2(len(data)))`` pads data with zeros to the next power of 2. This ensures that the faster FFT algorithm is used and not the slower DFT algorithm. `rfft()` only calculates the positive frequencies (halft the spectrum), which is ok if the signal is real-valued. — Dietrich, Dec 22 '15 at 12:17

jez · Accepted Answer · 2015-12-22T02:16:24.417

The following performs a mean over a selected region:

mask = numpy.logical_and( freqs >= lF, freqs <= uF )
avgValue = ps[ mask ].mean()

For proper scaling of power values that have been computed as abs(fft coefficients)**2, you will need to multiply by (2.0 / len(data))**2 (Parseval's theorem)

Note that it gets slightly fiddly if your frequency range includes the Nyquist frequency—for precise results, handling of that single frequency component would then need to depend on whether data.size is even or odd). So for simplicity, ensure that uF is strictly less than max(freqs). [For similar reasons you should ensure lF > 0.]

The reasons for this are tedious to explain and even more tedious to correct for, but basically: the DC component is represented once in the DFT, whereas most other frequency components are represented twice (positive frequency and negative frequency) at half-amplitude each time. The even-more-annoying exception is the Nyquist frequency which is represented once at full amplitude if the signal length is even, but twice at half amplitude if the signal length is odd. All of this would not affect you if you were averaging amplitude: in a linear system, being represented twice compensates for being at half amplitude. But you're averaging power, i.e. squaring the values before averaging, so this compensation doesn't work out.

I've pasted my code for grokking all of this. This code also shows how you can work with multiple signals stacked in one numpy array, which addresses your follow-up question about avoiding loops in the multi-channel case. Remember to supply the correct axis argument both to numpy.fft.fft() and to my fft2ap().

Please see my updated post. Also can you please elaborate more why I need to scale my power spectrum. — Muaz, Dec 21 '15 at 17:17

Dietrich · Answer 2 · 2015-12-19T19:00:50.280

If you really have a signal of 15 GB size, you'll not be able to calculate the FFT in an acceptable time. You can avoid using the FFT, if it is acceptable for you to approximate your frequency range by a band pass filter. The justification is the Poisson summation formula, which states that sum of squares is not changed by a FFT (or: the power is preserved). Staying in the time domain will let the processing time rise proportionally to the signal length.

The following code designs a Butterworth band path filter, plots the filter response and filters a sample signal:

import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

dd = np.random.randn(10**4)  # generate sample data
T = 1./2e3  # sampling interval
n, f_s = len(dd), 1./T  # number of points and sampling frequency

# design band path filter:
f_l, f_u = 50, 500  # Band from 50 Hz to 500 Hz
wp = np.array([f_l, f_u])*2/f_s  # normalized pass band frequnecies
ws = np.array([0.8*f_l, 1.2*f_u])*2/f_s  # normalized stop band frequencies
b, a = signal.iirdesign(wp, ws, gpass=60, gstop=80, ftype="butter",
                        analog=False)

# plot filter response:
w, h = signal.freqz(b, a, whole=False)
ff_w = w*f_s/(2*np.pi)
fg, ax = plt.subplots()
ax.set_title('Butterworth filter amplitude response')
ax.plot(ff_w, np.abs(h))
ax.set_ylabel('relative Amplitude')
ax.grid(True)
ax.set_xlabel('Frequency in Hertz')
fg.canvas.draw()


# do the filtering:
zi = signal.lfilter_zi(b, a)*dd[0]
dd1, _ = signal.lfilter(b, a, dd, zi=zi)

# calculate the avarage:
avg = np.mean(dd1**2)
print("RMS values is %g" % avg)
plt.show()

Read the documentation to Scipy's Filter design to learn how to modify the parameters of the filter.

If you want to stay with the FFT, read the docs on signal.welch and plt.psd. The Welch algorithm is a method to efficiently calculate the power spectral density of a signal (with some trade-offs).

hsinghal · Answer 3 · 2015-12-19T18:49:09.070

It is much easier to work with FFT if your arrays are power of 2. When you do fft the frequencies ranges from -pi/timestep to pi/timestep (assuming that frequency is defined as w = 2*pi/t, change the values accordingly if you use f =1/t representation). Your spectrum is arranged as 0 to minfreqq--maxfreq to zero. you can now use fftshift function to swap the frequencies and your spectrum looks like minfreq -- DC -- maxfreq. now you can easily determine your desired frequency range because it is already sorted.

The frequency step dw=2*pi/(time span) or max-frequency/(N/2) where N is array size. N/2 point is DC or 0 frequency. Nth position is max frequency now you can easily determine your range

Lower_freq_indx=N/2+N/2*Lower_freq/max_freq
Higher_freq_index=N/2+N/2*Higher_freq/Max_freq
avg=sum(ps[lower_freq_indx:Higher_freq_index]/(Higher_freq_index-Lower_freq_index)

I hope this will help

regards

Fastest way to get average value of frequencies within range

3 Answers3