1

I have a function that computes the fourier transform of an incoming audio stream, and then identifies what musical notes are most dominant within the audio. Everything is working (though I'm sure my code may be janky in many ways). The main thing I'd like to fix is the speed of the function. I'm using a chunk size of 2^14, which makes the function quite accurate, but also way slower than I'd like. I was wondering if there is any way to implement a C fft function that could run faster, but still have it run within my Python code. I'm also open to any other suggestions on how to get this function to run faster.

Here is the current function:

def pitch_calculations(stream, CHUNK, RATE):
    # Read mic stream and then call struct.unpack to convert from binary data back to floats
    data = stream.read(CHUNK, exception_on_overflow=False)
    dataInt = np.array(struct.unpack(str(CHUNK) + 'h', data))
    
    # Apply a window function (Hamming) to the input data
    windowed_data = np.hamming(CHUNK) * dataInt

    # Using numpy fast Fourier transform to convert mic data into frequencies
    fft_result = np.abs(fft.fft(windowed_data, threads=6)) * 2 / (11000 * CHUNK)
    fft_result_copy = fft_result
    freqs = np.fft.fftfreq(len(windowed_data), d=1.0 / RATE)
    
    fft_result = fft_result[:int(len(fft_result)/2)]
    freqs = freqs[:int(len(freqs)/2)]
    
    # Find the indices of local maxima in the frequency spectrum using scipy.signal find_peaks
    localmax_indices = find_peaks(fft_result, height=0.04)[0]
    
    # Get the magnitudes of the local maxima
    strong_freqs = fft_result[localmax_indices]
    
    # Sort the magnitudes in descending order
    sorted_indices = np.argsort(strong_freqs)[::-1]
    
    # Calculate the pitches from the peaks, calculate how many cents sharp/flat
    pitches = []
    
    for index in sorted_indices:
        pitches.append(abs(freqs[localmax_indices[index]]))
        
    note_list = [calculate_tuning(pitch)[0] for pitch in pitches]
    cent_list = [calculate_tuning(pitch)[1] for pitch in pitches]
    note_list = order_pitches(note_list)
    note_list = remove_duplicates(note_list)
    
    
    return note_list, cent_list
celewis
  • 11
  • 2
  • 5
    Every `np.` you do _is_ invoking C (or Fortran, or otherwise non-Python) code. Have you run a profiler to figure out where your time is going and determined that it _is_ inside Python in the first place, as opposed to something that genuinely is already C still being slower than you'd prefer? – Charles Duffy Jun 21 '23 at 15:42
  • 3
    NumPy functions (`np.fft.fft` & friends) are already C or FORTRAN code, so they're already "fast". You can [profile your code](https://stackoverflow.com/questions/582336/how-do-i-profile-a-python-script) to see what lines are the slowest. It could be that the majority of time is actually spent in some Python code. – ForceBru Jun 21 '23 at 15:45
  • It is unlikely that `fft.fft` isn't using some kind of C or assembler in the background. The one thing that could improve the speed of the FFT function itself is use dedicated hardware (like a GPU, which is good at parallel computing which can be used for FFTs, especially when you do many FFTs). – 12431234123412341234123 Jun 21 '23 at 15:45
  • all the good libraries already call C functions under the hood which even uses SIMD and parallelization so it's unlikely the reimplement that yourself will outperform those – phuclv Jun 21 '23 at 15:49
  • 1
    After profiling my code, it appears the actual source of the slow performance is PyAudio. almost all of the time is used reading the audio data before it's even processed... not sure what to do about that, as I don't want to use a smaller chunk size. – celewis Jun 21 '23 at 16:10
  • @celewis I don't see how the junk size has a big imact on PyAudio. But you have to test different junk sizes and see if the time spend in PyAudio changes. BTW: Are you sure the time spend is actually time used to do calculation? Or is PyAudio waiting for data? – 12431234123412341234123 Jun 21 '23 at 16:19
  • @12431234123412341234123 I lowered the chunk size from 2^14 to 2^13, and the time dropped to almost half what it took before (but it became less accurate). The function call that is taking the majority of the runtime is "built-in method pyaudio._portaudio.read_stream", which makes me think the bottleneck is the reading of the audio data coming in – celewis Jun 21 '23 at 16:43
  • An old maxim: _If you can't measure it, you can't tune it_ I'd add profiling for virtually each line of your posted function as they each seem to do something significant [in time]. Then, you can make some judgements about what is actually slow. The actual timed results can often be surprising. As others have mentioned, it's important to separate out the I/O waiting time from the calculation time. Maybe you could edit your code to do this, append the new code to the post and add the benchmark results. – Craig Estey Jun 21 '23 at 16:46
  • @celewis Does the total time drop (assuming you always read the same amount of data), or the time to read a single chunk? It is obvious that reading a single chunk will be shorter, but cutting the size of the chunks in half means you do double the amount of FFTs you do (so you end up with analysing the same amount of data, in total, just spread out across more FFTs), correct? – 12431234123412341234123 Jun 21 '23 at 16:53
  • 3
    Audio is slow. The most common audio sample rates are 44100 samples per second used by audio CDs, and 48000 samples per second used with digital video. If you want 16384 samples that's 16384/44100 seconds or 16384/48000 seconds. Either way, it's more than 1/3 of a second. – user3386109 Jun 21 '23 at 18:30

1 Answers1

2

As others have said, numpy already performs the underlying FFT computation with optimized native code.

The real problem is that a transform size of 2^14 is a nontrivial cost by any implementation. Suggestions to speed it up:

  • Reduce the audio sample rate and/or the transform size, if possible. 2^14 seems very high for determining musical note frequencies. It might be enough to downsample the audio to, say, 24 kHz (e.g. with librosa.resample) and use a transform size of 4096, for a spectrum with frequency resolution of 5.9 Hz per bin. Then use interpolation to estimate peaks with sub-bin percision.
  • Use float32 instead of float64: Before the transform, use windowed_data = windowed_data.astype(np.float32) to cast to float32.
  • Use real-to-complex FFTs with np.fft.rfft instead of the standard complex-to-complex transform. This might save near a factor 2 in computation time.
  • Process a batch of windows at a time, instead of one by one. Particularly, a batch of FFTs is in some cases more efficiently computed than computing FFTs individually. Use for instance np.fft.rfft(x, axis=0) with x being a 2D array to compute the real-to-complex FFT of each column of x.
Pascal Getreuer
  • 2,906
  • 1
  • 5
  • 14