Generating a spectrogram for a sequence of 2D movie frames

Question

I have some data that consists of a sequence of video frames which represent changes in luminance over time relative to a moving baseline. In these videos there are two kinds of 'event' that can occur - 'localised' events, which consist of luminance changes in small groups of clustered pixels, and contaminating 'diffuse' events, which affect most of the pixels in the frame:

enter image description here

I'd like to be able to isolate local changes in luminance from diffuse events. I'm planning on doing this by subtracting an appropriately low-pass filtered version of each frame. In order to design an optimal filter, I'd like to know which spatial frequencies of my frames are modulated during diffuse and local events, i.e. I'd like to generate a spectrogram of my movie over time.

I can find lots of information about generating spectrograms for 1D data (e.g. audio), but I haven't come across much on generating spectrograms for 2D data. What I've tried so far is to generate a 2D power spectrum from the Fourier transform of the frame, then perform a polar transformation about the DC component and then average across angles to get a 1D power spectrum:

enter image description here

I then apply this to every frame in my movie, and generate a raster plot of spectral power over time:

enter image description here

Does this seem like a sensible approach to take? Is there a more 'standard' approach to doing spectral analysis on 2D data?

Here's my code:

import numpy as np
# from pyfftw.interfaces.scipy_fftpack import fft2, fftshift, fftfreq
from scipy.fftpack import fft2, fftshift, fftfreq
from matplotlib import pyplot as pp
from matplotlib.colors import LogNorm
from scipy.signal import windows
from scipy.ndimage.interpolation import map_coordinates

def compute_2d_psd(img, doplot=True, winfun=windows.hamming, winfunargs={}):

    nr, nc = img.shape
    win = make2DWindow((nr, nc), winfun, **winfunargs)

    f2 = fftshift(fft2(img*win))
    psd = np.abs(f2*f2)
    pol_psd = polar_transform(psd, centre=(nr//2, nc//2))

    mpow = np.nanmean(pol_psd, 0)
    stdpow = np.nanstd(pol_psd, 0)

    freq_r = fftshift(fftfreq(nr))
    freq_c = fftshift(fftfreq(nc))
    pos_freq = np.linspace(0, np.hypot(freq_r[-1], freq_c[-1]), 
        pol_psd.shape[1])

    if doplot:
        fig,ax = pp.subplots(2,2)

        im0 = ax[0,0].imshow(img*win, cmap=pp.cm.gray)
        ax[0,0].set_axis_off()
        ax[0,0].set_title('Windowed image')

        lnorm = LogNorm(vmin=psd.min(), vmax=psd.max())
        ax[0,1].set_axis_bgcolor('k')
        im1 = ax[0,1].imshow(psd, extent=(freq_c[0], freq_c[-1], 
            freq_r[0], freq_r[-1]), aspect='auto', 
            cmap=pp.cm.hot, norm=lnorm)
        # cb1 = pp.colorbar(im1, ax=ax[0,1], use_gridspec=True)
        # cb1.set_label('Power (A.U.)')
        ax[0,1].set_title('2D power spectrum')

        ax[1,0].set_axis_bgcolor('k')
        im2 = ax[1,0].imshow(pol_psd, cmap=pp.cm.hot, norm=lnorm, 
            extent=(pos_freq[0],pos_freq[-1],0,360), 
            aspect='auto')
        ax[1,0].set_ylabel('Angle (deg)')
        ax[1,0].set_xlabel('Frequency (cycles/px)')
        # cb2 = pp.colorbar(im2, ax=(ax[0,1],ax[1,1]), use_gridspec=True)
        # cb2.set_label('Power (A.U.)')
        ax[1,0].set_title('Polar-transformed power spectrum')

        ax[1,1].hold(True)
        # ax[1,1].fill_between(pos_freq, mpow - stdpow, mpow + stdpow, 
        #   color='r', alpha=0.3)
        ax[1,1].axvline(0, c='k', ls='--', alpha=0.3)
        ax[1,1].plot(pos_freq, mpow, lw=3, c='r')
        ax[1,1].set_xlabel('Frequency (cycles/px)')
        ax[1,1].set_ylabel('Power (A.U.)')
        ax[1,1].set_yscale('log')
        ax[1,1].set_xlim(-0.05, None)
        ax[1,1].set_title('1D power spectrum')

        fig.tight_layout()

    return mpow, stdpow, pos_freq

def make2DWindow(shape,winfunc,*args,**kwargs):
    assert callable(winfunc)
    r,c = shape
    rvec = winfunc(r,*args,**kwargs)
    cvec = winfunc(c,*args,**kwargs)
    return np.outer(rvec,cvec)

def polar_transform(image, centre=(0,0), n_angles=None, n_radii=None):
    """
    Polar transformation of an image about the specified centre coordinate
    """
    shape = image.shape
    if n_angles is None:
        n_angles = shape[0]
    if n_radii is None:
        n_radii = shape[1]
    theta = -np.linspace(0, 2*np.pi, n_angles, endpoint=False).reshape(-1,1)
    d = np.hypot(shape[0]-centre[0], shape[1]-centre[1])
    radius = np.linspace(0, d, n_radii).reshape(1,-1)
    x = radius * np.sin(theta) + centre[0]
    y = radius * np.cos(theta) + centre[1]

    # nb: map_coordinates can give crazy negative values using higher order
    # interpolation, which introduce nans when you take the log later on
    output = map_coordinates(image, [x, y], order=1, cval=np.nan, 
        prefilter=True)
    return output

What you are doing is similar to what is known in the digital halftoning world as a RAPS (Radially Averaged Power Spectrum), so it does make sense in a general sense, not sure about your video application... — Jaime, Dec 02 '13 at 14:49
@Jaime That's useful to know - I'll have a read into the RAPS literature — ali_m, Dec 02 '13 at 15:18
I would solve that by a thresholding of each frame, so that only the events are binarized, labeling the binarized frames with ndimage and check for a number/clustersize as an upper-limit. — Dschoni, Feb 14 '14 at 16:16
@Dschoni it's not really clear from that particular example, but there are also 'localized' events that manifest simultaneously in large numbers of circular blobs (actually brain cells), and 'diffuse' events that are still 'fuzzy', but affect a more spatially restricted portion of the frame. for that reason, thresholding the mean pixel value of each frame (which is what I think you're suggesting) probably won't work very well in my case. I do have a semi-working solution that uses spatiotemporal ICA to extract signals that are sparse in both space and time. — ali_m, Feb 14 '14 at 21:05
@ali_m: So how do you basically distinguish "event" from noise? As I understand, you do that somehow (by change in grey-values?) and afterwards want to distinguish different kinds of events. — Dschoni, Feb 18 '14 at 12:46
@Dschoni essentially according to their weighted sparseness in space and in time — ali_m, Feb 18 '14 at 15:48
@ali_m: So Basically, you could stack all the images along the time domain and have a 3D Image Volume. Then, the sparseness in time becomes a sparseness along the z-axis and you could do volume analysis of these sparse 3D blobs.... Just thinking out loud. — Dschoni, Feb 19 '14 at 00:58
Extracting events from 2-photon calcium imaging of neurons is a notoriously unsolved problem. Have you tried any of the standard approaches in the neuroscience literature? — cxrodgers, Sep 17 '14 at 06:56
If you have an idea of the size of the events you are trying to detect, maybe a band-pass filter on a 2D discrete wavelet transform could do the job ? — Matthieu Pizenberg, May 28 '15 at 16:54

score 2 · Answer 1 · answered Jul 08 '15 at 12:47

I believe that the approach you describe is in general the best way to do this analysis.

However, i did spot an error in your code. as:

np.abs(f2*f2)

is not the PSD of complex array f2, you need to multiply f2 by it's complex conjugate instead of itself (|f2^2| is not the same as |f2|^2).

Instead you should do something like

(f2*np.conjugate(f2)).astype(float)

Or, more cleanly:

np.abs(f2)**2.

The oscillations in the 2D power-spectrum are a tell-tale sign of this kind of error (I've done this before myself!)

Generating a spectrogram for a sequence of 2D movie frames

1 Answers1