5

I want to get frequency with maximum power for every moment in wav file. So I wrote STFT in Python using fft from scipy. I used kaiser window function from scipy. Everything looking great, but my output looks strange. It has some very small numbers and some very high.

here is the output for one wav file: http://pastebin.com/5Ryd2uXj and here is the code in python:

import scipy, pylab
import wave
import struct
import sys

def stft(data, cp, do, hop):
    dos = int(do*cp)
    w = scipy.kaiser(dos,12) //12 is very high for kaiser window
    temp=[]
    wyn=[]
    for i in range(0, len(data)-dos, hop):
        temp=scipy.fft(w*data[i:i+dos])
        max=-1
        for j in range(0, len(temp),1):
            licz=temp[j].real**2+temp[j].imag**2
            if( licz>max ):
                max = licz
                maxj = j
        wyn.append(maxj)
    #wyn = scipy.array([scipy.fft(w*data[i:i+dos])
        #for i in range(0, len(data)-dos, 1)])
    return wyn

file = wave.open(sys.argv[1])
bity = file.readframes(file.getnframes())
data=struct.unpack('{n}h'.format(n=file.getnframes()), bity)
file.close()

cp=44100 #sampling frequency
do=0.05 #window size
hop = 5

wyn=stft(data,cp,do,hop)
print len(wyn)
for i in range(0, len(wyn), 1):
    print wyn[i]
user1226419
  • 53
  • 1
  • 3
  • 2
    Did you try testing it against a known waveform like a sine wave to see if you get the expected output? – steve8918 Feb 22 '12 at 17:13
  • I just found this: http://stackoverflow.com/questions/2459295/stft-and-istft-in-python It looks similar and i see that on plot of sinus are 2 lines, not 1. I have the same in my output for sinus. I don't know why... – user1226419 Feb 22 '12 at 19:01

1 Answers1

6

The actual FT of a sine wave is a pair of delta functions equidistant from 0-frequency. With a discrete function (samples), this is repeated every fs (sampling rate) in the frequency domain. Small errors in FFT computation will mean these two deltas (FT of your sine wave) will not be exactly the same height, so your algorithm is simply picking the taller one.

The scipy FFT function will give you frequency components with the domain [0, fs]. Since (as I mentioned above) this is periodic, these values could also be remapped as [-fs/2, fs/2] by swapping the result at the center point - look into using fftshift to do this. It sounds like you may only be interested in the positive frequencies, however, so you can simply discard the second half of the result of your FFT.

From the notes of scipy.fftpack.fft:

The packing of the result is “standard”: If A = fft(a, n), then A[0] contains the zero-frequency term, A[1:n/2+1] contains the positive-frequency terms, and A[n/2+1:] contains the negative-frequency terms, in order of decreasingly negative frequency. So for an 8-point transform, the frequencies of the result are [ 0, 1, 2, 3, 4, -3, -2, -1].

aganders3
  • 5,838
  • 26
  • 30