Invertible STFT and ISTFT in Python

Question

Is there any general-purpose form of short-time Fourier transform with corresponding inverse transform built into SciPy or NumPy or whatever?

There's the pyplot specgram function in matplotlib, which calls ax.specgram(), which calls mlab.specgram(), which calls _spectral_helper():

#The checks for if y is x are so that we can use the same function to
#implement the core of psd(), csd(), and spectrogram() without doing
#extra calculations.  We return the unaveraged Pxy, freqs, and t.

but

This is a helper function that implements the commonality between the 204 #psd, csd, and spectrogram. It is NOT meant to be used outside of mlab

I'm not sure if this can be used to do an STFT and ISTFT, though. Is there anything else, or should I translate something like these MATLAB functions?

I know how to write my own ad-hoc implementation; I'm just looking for something full-featured, which can handle different windowing functions (but has a sane default), is fully invertible with COLA windows (istft(stft(x))==x), tested by multiple people, no off-by-one errors, handles the ends and zero padding well, fast RFFT implementation for real input, etc.

I'm looking for exactly the same thing, similar to Matlab's "spectrogram" function. — Kurt Peek, May 10 '14 at 21:08
@khpeek See http://matplotlib.org/api/mlab_api.html#matplotlib.mlab.specgram — endolith, May 10 '14 at 21:54
SciPy has this now: http://scipy.github.io/devdocs/generated/scipy.signal.stft.html — endolith, Mar 16 '17 at 01:55

score 65 · Answer 1 · edited Jul 24 '15 at 04:47

65

Here is my Python code, simplified for this answer:

import scipy, pylab

def stft(x, fs, framesz, hop):
    framesamp = int(framesz*fs)
    hopsamp = int(hop*fs)
    w = scipy.hanning(framesamp)
    X = scipy.array([scipy.fft(w*x[i:i+framesamp]) 
                     for i in range(0, len(x)-framesamp, hopsamp)])
    return X

def istft(X, fs, T, hop):
    x = scipy.zeros(T*fs)
    framesamp = X.shape[1]
    hopsamp = int(hop*fs)
    for n,i in enumerate(range(0, len(x)-framesamp, hopsamp)):
        x[i:i+framesamp] += scipy.real(scipy.ifft(X[n]))
    return x

Notes:

The list comprehension is a little trick I like to use to simulate block processing of signals in numpy/scipy. It's like blkproc in Matlab. Instead of a for loop, I apply a command (e.g., fft) to each frame of the signal inside a list comprehension, and then scipy.array casts it to a 2D-array. I use this to make spectrograms, chromagrams, MFCC-grams, and much more.
For this example, I use a naive overlap-and-add method in istft. In order to reconstruct the original signal the sum of the sequential window functions must be constant, preferably equal to unity (1.0). In this case, I've chosen the Hann (or hanning) window and a 50% overlap which works perfectly. See this discussion for more information.
There are probably more principled ways of computing the ISTFT. This example is mainly meant to be educational.

A test:

if __name__ == '__main__':
    f0 = 440         # Compute the STFT of a 440 Hz sinusoid
    fs = 8000        # sampled at 8 kHz
    T = 5            # lasting 5 seconds
    framesz = 0.050  # with a frame size of 50 milliseconds
    hop = 0.025      # and hop size of 25 milliseconds.

    # Create test signal and STFT.
    t = scipy.linspace(0, T, T*fs, endpoint=False)
    x = scipy.sin(2*scipy.pi*f0*t)
    X = stft(x, fs, framesz, hop)

    # Plot the magnitude spectrogram.
    pylab.figure()
    pylab.imshow(scipy.absolute(X.T), origin='lower', aspect='auto',
                 interpolation='nearest')
    pylab.xlabel('Time')
    pylab.ylabel('Frequency')
    pylab.show()

    # Compute the ISTFT.
    xhat = istft(X, fs, T, hop)

    # Plot the input and output signals over 0.1 seconds.
    T1 = int(0.1*fs)

    pylab.figure()
    pylab.plot(t[:T1], x[:T1], t[:T1], xhat[:T1])
    pylab.xlabel('Time (seconds)')

    pylab.figure()
    pylab.plot(t[-T1:], x[-T1:], t[-T1:], xhat[-T1:])
    pylab.xlabel('Time (seconds)')

STFT of 440 Hz sinusoid ISTFT of beginning of 440 Hz sinusoid ISTFT of end of 440 Hz sinusoid

edited Jul 24 '15 at 04:47

Trevor

716
6
9

answered Jul 31 '11 at 19:30

Steve Tjoa

59,122
18
90
101

Is there an unsimplified version online you can link to? – endolith Jul 31 '11 at 21:02
Not off the top of my head. But is there anything wrong with the above code? You can modify it, if necessary. – Steve Tjoa Aug 02 '11 at 04:41
No, but you said "simplified for this answer", so I assumed this was an abridged version of something else you wrote – endolith Aug 02 '11 at 04:44
1

Sorry for the confusion. Yes, simplified from my application-specific needs. Example features: if the input is a stereo signal, make it mono first; plot the spectrogram over a given frequency and time range; plot the log-spectrogram; round `framesamp` up to the nearest power of two; embed `stft` inside a `Spectrogram` class; etc. Your needs may differ. But the core code above still gets the job done. – Steve Tjoa Aug 02 '11 at 05:09
You're welcome. If you would like me to add certain features, feel free to ask. I removed excessive code that I thought would obscure the *intuition* behind the core STFT method. – Steve Tjoa Aug 02 '11 at 17:00
Well I understand the STFT, I was just looking for a robust pre-existing general function that could be used in different scenarios without worrying about speed optimization, off-by-one errors, or other mistakes. – endolith Aug 03 '11 at 04:05
3

Thanks for this code. Just a remark : what happens in `stft` if x is not a multiple of the `hop` length ? Shouldn't the last frame be zero-padded ? – Basj Nov 19 '13 at 14:08
I don't believe there is a problem if `x` is not a multiple of `hopsamp` because of `range(0, len(x)-framesamp, hopsamp)`. Example: `x` is 1003 samples long, `framesamp` is 200 samples, and `hopsamp` is 100 samples. `i` increments from 0, 100, ..., 700, 800, for a total of nine iterations. So `stft` is truncating a small portion from the end, yes. That's usually not a problem. Of course you can modify the code to pad the signal such that truncation does not occur; I just wrote some quick and dirty code for the purpose of this question, but it should run without error. – Steve Tjoa Nov 19 '13 at 18:58
@SteveTjoa I have tested this code and it is very helpful also for me. But if we are sampling voltage over time, after performing the stft with this code, what are the units of the output? Thanks! – Janus Gowda Oct 10 '14 at 13:40
Could you explain this more "In order to reconstruct the original signal the sum of the sequential window functions must be constant". You mean this must be true to reconstruct the signal without using the phase information from the stft. Why does this work? – BrainPermafrost Dec 01 '16 at 06:41

Basj · Answer 2 · 2013-12-05T19:50:28.297

Here is the STFT code that I use. STFT + ISTFT here gives perfect reconstruction (even for the first frames). I slightly modified the code given here by Steve Tjoa : here the magnitude of the reconstructed signal is the same as that of the input signal.

import scipy, numpy as np

def stft(x, fftsize=1024, overlap=4):   
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]      # better reconstruction with this trick +1)[:-1]  
    return np.array([np.fft.rfft(w*x[i:i+fftsize]) for i in range(0, len(x)-fftsize, hop)])

def istft(X, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]
    x = scipy.zeros(X.shape[0]*hop)
    wsum = scipy.zeros(X.shape[0]*hop) 
    for n,i in enumerate(range(0, len(x)-fftsize, hop)): 
        x[i:i+fftsize] += scipy.real(np.fft.irfft(X[n])) * w   # overlap-add
        wsum[i:i+fftsize] += w ** 2.
    pos = wsum != 0
    x[pos] /= wsum[pos]
    return x

Can you please explain what the results are? In a few words. I used your code and it works, but not sure how to interpret it yet... — El Dude, Mar 25 '15 at 00:33

endolith · Answer 3 · 2017-05-08T15:29:54.420

3

librosa.core.stft and istft look pretty similar to what I was looking for, though they didn't exist at the time:

librosa.core.stft(y, n_fft=2048, hop_length=None, win_length=None, window=None, center=True, dtype=<type 'numpy.complex64'>)

They don't invert exactly, though; the ends are tapered.

edited May 08 '17 at 15:29

answered Jan 20 '15 at 02:32

endolith

25,479
34
128
192

score 2 · Accepted Answer · answered May 08 '17 at 14:48

2

I'm a little late to this, but realised scipy has inbuilt istft function as of 0.19.0

answered May 08 '17 at 14:48

tea_pea

1,482
14
19

Yeah it was added recently. https://github.com/scipy/scipy/pull/6058 I guess this should be the accepted answer, though. – endolith May 08 '17 at 15:29

capybaralet · Answer 5 · 2014-05-21T01:50:51.417

Neither of the above answers worked well OOTB for me. So I modified Steve Tjoa's.

import scipy, pylab
import numpy as np

def stft(x, fs, framesz, hop):
    """
     x - signal
     fs - sample rate
     framesz - frame size
     hop - hop size (frame size = overlap + hop size)
    """
    framesamp = int(framesz*fs)
    hopsamp = int(hop*fs)
    w = scipy.hamming(framesamp)
    X = scipy.array([scipy.fft(w*x[i:i+framesamp]) 
                     for i in range(0, len(x)-framesamp, hopsamp)])
    return X

def istft(X, fs, T, hop):
    """ T - signal length """
    length = T*fs
    x = scipy.zeros(T*fs)
    framesamp = X.shape[1]
    hopsamp = int(hop*fs)
    for n,i in enumerate(range(0, len(x)-framesamp, hopsamp)):
        x[i:i+framesamp] += scipy.real(scipy.ifft(X[n]))
    # calculate the inverse envelope to scale results at the ends.
    env = scipy.zeros(T*fs)
    w = scipy.hamming(framesamp)
    for i in range(0, len(x)-framesamp, hopsamp):
        env[i:i+framesamp] += w
    env[-(length%hopsamp):] += w[-(length%hopsamp):]
    env = np.maximum(env, .01)
    return x/env # right side is still a little messed up...

endolith · Answer 6 · 2011-07-31T21:09:37.973

1

Found another STFT, but no corresponding inverse function:

http://code.google.com/p/pytfd/source/browse/trunk/pytfd/stft.py

def stft(x, w, L=None):
    ...
    return X_stft

w is a window function as an array
L is the overlap, in samples

edited Jul 31 '11 at 21:09

answered Mar 18 '10 at 13:54

endolith

25,479
34
128
192

I have tested this code. It freezed my computer for large data sets. The implementation by Steve Tjoa works much better. – Janus Gowda Oct 10 '14 at 13:42

endolith · Answer 7 · 2011-07-31T21:11:44.357

I also found this on GitHub, but it seems to operate on pipelines instead of normal arrays:

http://github.com/ronw/frontend/blob/master/basic.py#LID281

def STFT(nfft, nwin=None, nhop=None, winfun=np.hanning):
    ...
    return dataprocessor.Pipeline(Framer(nwin, nhop), Window(winfun),
                                  RFFT(nfft))


def ISTFT(nfft, nwin=None, nhop=None, winfun=np.hanning):
    ...
    return dataprocessor.Pipeline(IRFFT(nfft), Window(winfun),
                                  OverlapAdd(nwin, nhop))

score 0 · Answer 8 · answered Apr 06 '16 at 18:55

0

I think scipy.signal has what you are looking for. It has reasonable defaults, supports multiple window types, etc...

http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.signal.spectrogram.html

from scipy.signal import spectrogram
freq, time, Spec = spectrogram(signal)

answered Apr 06 '16 at 18:55

meatcomputer

19
2

1

There's no inverse function though https://github.com/scipy/scipy/issues/5757#issuecomment-191516652 – endolith Apr 06 '16 at 20:13

score 0 · Answer 9 · 2018-08-03T13:01:07.270

A fixed version of basj's answer.

import scipy, numpy as np
import matplotlib.pyplot as plt

def stft(x, fftsize=1024, overlap=4):
    hop=fftsize//overlap
    w = scipy.hanning(fftsize+1)[:-1]      # better reconstruction with this trick +1)[:-1]  
    return np.vstack([np.fft.rfft(w*x[i:i+fftsize]) for i in range(0, len(x)-fftsize, hop)])

def istft(X, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop=fftsize//overlap
    w=scipy.hanning(fftsize+1)[:-1]
    rcs=int(np.ceil(float(X.shape[0])/float(overlap)))*fftsize
    print(rcs)
    x=np.zeros(rcs)
    wsum=np.zeros(rcs)
    for n,i in zip(X,range(0,len(X)*hop,hop)): 
        l=len(x[i:i+fftsize])
        x[i:i+fftsize] += np.fft.irfft(n).real[:l]   # overlap-add
        wsum[i:i+fftsize] += w[:l]
    pos = wsum != 0
    x[pos] /= wsum[pos]
    return x

a=np.random.random((65536))
b=istft(stft(a))
plt.plot(range(len(a)),a,range(len(b)),b)
plt.show()

Just to be sure, what does it fix exactly? (was there some mistake?) — Basj, Sep 19 '18 at 10:07

Антон Пименов · Answer 10 · 2023-03-04T19:14:32.913

Just share my solution

imports

import numpy as np
import matplotlib.pyplot as plt

define forward and inverse fft functions

def next_pow2(x):
    return int(np.ceil(np.log(x)/np.log(2)))

def stft(x, w_length, w_shift, nfft=None, window=np.hanning):
    w = window(w_length)
    x_length = x.shape[0]
    nfft = nfft if nfft else 2 ** next_pow2(w_length)
    assert nfft >= w_length
    
    n_step = 1
    n_pad = w_length - x_length
    if x_length > w_length:
        n_step += int(np.ceil((x_length - w_length) / w_shift))
        
        n_tail = np.mod((x_length - w_length), w_shift)
        n_pad = w_shift - n_tail if n_tail > 0 else 0
    
    x_padded = np.pad(x, [0, n_pad])
    y = np.empty((n_step, nfft), dtype="complex")
    for n in range(0, n_step):
        n_start = n * w_shift
        n_end = n_start + w_length
        y[n] = np.fft.fft(w * x[n_start:n_end], nfft)
    return y

def istft(x, w_length, w_shift, window):
    n_overlap = w_length - w_shift
    w = window(w_length)
    x_length = w_length + (x.shape[0] - 1) * w_shift
    
    y = np.zeros(x_length, dtype="float")
    window_fix = np.zeros(x_length, dtype="float")
    for _n, _s in enumerate(x):
        n_start = _n * w_shift
        n_end = n_start + w_length
        
        x_ifft = np.real(np.fft.ifft(_s)[:w_length])
        if _n == 0:
            y[n_start:n_end] = x_ifft
            window_fix[n_start:n_end] = w
        else:
            n_end_overlap = n_start + n_overlap
            y[n_start:n_end_overlap] = 0.5 * (y[n_start:n_end_overlap] + x_ifft[:n_overlap])
            y[n_end_overlap:n_end] = x_ifft[n_overlap:]
            
            window_fix[n_start:n_end_overlap] = 0.5 * (window_fix[n_start:n_end_overlap] + w[:n_overlap])
            window_fix[n_end_overlap:n_end] = w[n_overlap:]
            
    w_non_zero = window_fix != 0
    y[w_non_zero] = y[w_non_zero] / window_fix[w_non_zero]
            
    return y

compute toy example

w_length = 400  # 20ms for 16kHz signal 
w_step = 240  # 15ms for 16kHz signal - 10ms overlaped, standart for speech processig
window = np.hamming
y = 1 + np.random.randn(16000)
s = stft(y, w_length=w_length, w_shift=w_step, window=window)
y_recovery = istft(s, w_length=w_length, w_shift=w_step, window=window)

check result on edge of window

plt.subplot(2,1,1)
plt.plot(y[350:550])
plt.plot(y_recovery[350:550], '--')

plt.subplot(2,1,2)
plt.plot(y[0:200])
plt.plot(y_recovery[0:200], '--')

score -3 · Answer 11 · answered Aug 02 '11 at 01:31

-3

If you have access to a C binary library that does what you want, then use http://code.google.com/p/ctypesgen/ to generate a Python interface to that library.

answered Aug 02 '11 at 01:31

Michael Dillon

31,973
6
70
106

Invertible STFT and ISTFT in Python

11 Answers11

Linked