9

What I am trying to achieve is the following: I need the frequency values of a sound file (.wav) for analysis. I know a lot of programs will give a visual graph (spectrogram) of the values but I need to raw data. I know this can be done with FFT and should be fairly easily scriptable in python but not sure how to do it exactly. So let's say that a signal in a file is .4s long then I would like multiple measurements giving an output as an array for each timepoint the program measures and what value (frequency) it found (and possibly power (dB) too). The complicated thing is that I want to analyse bird songs, and they often have harmonics or the signal is over a range of frequency (e.g. 1000-2000 Hz). I would like the program to output this information as well, since this is important for the analysis I would like to do with the data :)

Now there is a piece of code that looked very much like I wanted, but I think it does not give me all the values I want.... (thanks to Justin Peel for posting this to a different question :)) So I gather that I need numpy and pyaudio but unfortunately I am not familiar with python so I am hoping that a Python expert can help me on this?

Source Code:

# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np

chunk = 2048

# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
                p.get_format_from_width(wf.getsampwidth()),
                channels = wf.getnchannels(),
                rate = RATE,
                output = True)

# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
    # write data out to the audio stream
    stream.write(data)
    # unpack the data and times by the hamming window
    indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
                                         data))*window
    # Take the fft and square each value
    fftData=abs(np.fft.rfft(indata))**2
    # find the maximum
    which = fftData[1:].argmax() + 1
    # use quadratic interpolation around the max
    if which != len(fftData)-1:
        y0,y1,y2 = np.log(fftData[which-1:which+2:])
        x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
        # find the frequency and output it
        thefreq = (which+x1)*RATE/chunk
        print "The freq is %f Hz." % (thefreq)
    else:
        thefreq = which*RATE/chunk
        print "The freq is %f Hz." % (thefreq)
    # read some more data
    data = wf.readframes(chunk)
if data:
    stream.write(data)
stream.close()
p.terminate()
Steve Tjoa
  • 59,122
  • 18
  • 90
  • 101
Mieke Zwart
  • 99
  • 1
  • 1
  • 3
  • 3
    Did you try "search" yet? This question has been asked. http://stackoverflow.com/questions/2648151/python-frequency-detection for example. – S.Lott Dec 13 '10 at 19:20
  • 1
    Yes, this is at least the 5th time this question has come up on SO in the last 2 weeks. – Brad Dec 13 '10 at 20:07
  • Yes I had searched and looked around.. but didnt find the exact answer I needed. But while searching further I found a program that does exactly what i need for free:) sound analysis pro if anyone else reads this question and is looking to do similar things. You can get the data (frequency etc) with this program exported to either Excel or matlab! – Mieke Zwart Dec 16 '10 at 15:23

2 Answers2

8

I'm not sure if this is what you want, if you just want the FFT:

import scikits.audiolab, scipy
x, fs, nbits = scikits.audiolab.wavread(filename)
X = scipy.fft(x)

If you want the magnitude response:

import pylab
Xdb = 20*scipy.log10(scipy.absolute(X))
f = scipy.linspace(0, fs, len(Xdb))
pylab.plot(f, Xdb)
pylab.show()
ssundarraj
  • 809
  • 7
  • 16
Steve Tjoa
  • 59,122
  • 18
  • 90
  • 101
  • I got this to work but only on mono sound files. Stereo seems to be a problem – Mieke Zwart Dec 16 '10 at 15:24
  • 1
    Printing X value giving this output `[-1.15917969+0.j -0.06542969+0.j -0.06542969+0.j ..., -0.06542969+0.j -0.06542969+0.j -0.06542969+0.j] ` But I should get only one frequency, right? where is the frequency – optimus prime Jun 13 '16 at 09:57
5

I think that what you need to do is a Short-time Fourier Transform(STFT). Basically, you do multiple partially overlapping FFTs and add them together for each point in time. Then you would find the peak for each point in time. I haven't done this myself, but I've looked into it some in the past and this is definitely the way to go forward.

There's some Python code to do a STFT here and here.

Ryan Fox
  • 10,103
  • 5
  • 38
  • 48
Justin Peel
  • 46,722
  • 6
  • 58
  • 80