6

I am trying to perform a FFT on a song (audio file in wav format, about 3 minutes long) which I created as follows, just in case it is relevant.

ffmpeg -i "$1" -vn -ab 128k -ar 44100 -y -ac 1 "${1%.webm}.wav"

Where $1 is the name of a webm file.

This is the code which is supposed to display a FFT of the given file:

import numpy as np
import matplotlib.pyplot as plt

# presume file already converted to wav.
file = os.path.join(temp_folder, file_name)

rate, aud_data = scipy.io.wavfile.read(file)

# wav file is mono.
channel_1 = aud_data[:]

fourier = np.fft.fft(channel_1)

plt.figure(1)
plt.plot(fourier)
plt.xlabel('n')
plt.ylabel('amplitude')
plt.show()

The problem is, it takes for ever. It takes so long that I cannot even show the output, since I've had plenty of time to research and write this post and it still has not finished.

I presume that the file is too long, since

print (aud_data.shape)

outputs (9218368,), but this looks like a real world problem, so I hope there is a way to obtain an FFT of an audio file somehow.

What am I doing wrong? Thank you.

edit

A better formulation of the question would be: is the FFT of any good in music processing? For example similarity of 2 pieces.

As pointed out in the comments, my plain approach is way too slow.

Thank you.

Gerry
  • 1,938
  • 3
  • 18
  • 25
  • Possible duplicate of [plotting spectrogram in audio analysis](https://stackoverflow.com/questions/47954034/plotting-spectrogram-in-audio-analysis) – Pavel Dec 26 '17 at 19:21
  • 3
    Well, FFT runs in O(nlogn) time, and you have ~10^8 samples, so yeah, what did you expect? – Mad Physicist Dec 26 '17 at 19:21
  • @Pavel. Closely related, but not quite the same. I don't think a dupe vote is warranted here. – Mad Physicist Dec 26 '17 at 19:23
  • @Physicist makes sense. So using FFT on music is not a thing? I figured it would be widely used for finding similar songs for instance. – Gerry Dec 26 '17 at 19:29
  • 4
    Since the power spectrum of music is time-varying you would typically use a STFT to take the FFT of successive (usually overlapping) short windows, where any given window can be considered to be statistically stationary (10 ms to 20 ms is typical for window size). The resulting 3D data structure is called a spectrogram and has a wide range of applications. – Paul R Dec 26 '17 at 19:57
  • using an exact power of 2 length will give best speed results with fft, apparently there's also GPU accelerated fft: https://www.google.com/search?q=python+cuda&oq=python+cuda – f5r5e5d Dec 26 '17 at 20:33
  • 2
    @PaulR thank you for your answer. I'll have a closer look at spectograms. If you post an answer I'll accept it. However, how can a bunch of repeated FFTs be this much faster than one FFT on the entire input? I don't understand. – Gerry Dec 26 '17 at 21:04
  • 2
    Two reasons: (i) FFT is O(n log n) - if you do the math then you will see that a number of small FFTs is more efficient than one large one; (ii) smaller FFTs are typically much more cache-friendly - the FFT makes log2(n) passes through the data, with a somewhat “random” access pattern, so it can make a huge difference if your n data points all fit in cache. – Paul R Dec 26 '17 at 21:11
  • See also: [this answer](https://stackoverflow.com/a/35703340/253056) and [this answer](https://stackoverflow.com/a/19277250/253056). – Paul R Dec 26 '17 at 21:18
  • @Gerry It might be a good idea to separate out the fft portion and the plotting portion. Plotting that many points will probably be incredibly slow even if the FFT is lightning fast. – Paul Dec 28 '17 at 22:27

2 Answers2

7

To considerably speed up the fft portion of your analysis, you can zero-pad out your data to a power of 2:

import scipy
import numpy as np
import matplotlib.pyplot as plt

# rate, aud_data = scipy.io.wavfile.read(file)
rate, aud_data = 44000, np.random.random((9218368,))

len_data = len(aud_data)

channel_1 = np.zeros([2**(int(np.ceil(np.log2(len_data)))),1])
channel_1[0:len_data] = aud_data

fourier = np.fft.fft(channel_1)

Here is an example of plotting the real component of the fourier transform of a few sine waves using the above method:

import numpy as np
import matplotlib.pyplot as plt

# rate, aud_data = scipy.io.wavfile.read(file)
rate = 44000
ii = np.arange(0, 9218368)
t = ii / rate
aud_data = np.zeros(len(t))
for w in [1000, 5000, 10000, 15000]:
    aud_data += np.cos(2 * np.pi * w * t)

# From here down, everything else can be the same
len_data = len(aud_data)

channel_1 = np.zeros(2**(int(np.ceil(np.log2(len_data)))))
channel_1[0:len_data] = aud_data

fourier = np.fft.fft(channel_1)
w = np.linspace(0, 44000, len(fourier))

# First half is the real component, second half is imaginary
fourier_to_plot = fourier[0:len(fourier)//2]
w = w[0:len(fourier)//2]

plt.figure(1)

plt.plot(w, fourier_to_plot)
plt.xlabel('frequency')
plt.ylabel('amplitude')
plt.show()
Wes Hardaker
  • 21,735
  • 2
  • 38
  • 69
Paul
  • 10,381
  • 13
  • 48
  • 86
  • The second input argument to `np.fft.fft` is the transform length. Set that to your computed power of two length, and you don't need to explicitly zero-pad the data. – Cris Luengo Jul 19 '23 at 15:33
1

The FFT is very useful in music, but not very useful when applied to a whole song. Typically a song is not so repetitive that lumping it all together would make sense.

About the computation time: 9218368 = 26 x 144037. That large prime factor makes the FFT very slow. Padding it or cropping it so that the length is a product of smaller integers will make it much faster (10 million points is really not all that much!).

Note that the length doesn't need to be a power of two to be efficient, multiples of 3, 5 or even 7 can still be quite efficient. For example use scipy.fft.next_fast_len() to find a good length to pad to.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120