9

I saw someone do this in a presentation but I'm having a hard time reproducing what he was able to do. Here's a slide from his presentation:

Sinewave decomposition via FFT

Pretty cool. He decomposed a dataset using FFT, then plotted the appropriate sine waves that the FFT specified.

So in an effort to recreate what he did, I created a series of points that correspond to the combination of 2 sine waves:

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

x = np.arange(0, 10, 0.01)
x2 = np.arange(0, 20, 0.02)
sin1 = np.sin(x)
sin2 = np.sin(x2)
x2 /= 2
sin3 = sin1 + sin2
plt.plot(x, sin3)
plt.show()

Composed wave

Now I want to decompose this wave (or rather, the wave that the points imply) back into the original 2 sine waves:

# goal: sin3 -> sin1, sin2
# sin3 
array([ 0.00000000e+00,  2.99985000e-02,  ... 3.68998236e-01])
# sin1 
array([ 0.        ,  0.00999983,  0.01999867,  ... -0.53560333])
# sin2 
array([ 0.        ,  0.01999867,  0.03998933, ... 0.90460157])

I start by importing numpy and getting the fft of sin3:

import numpy as np
fft3 = np.fft.fft(sin3)

ok, that's about as far as I get. Now I've got an array with complex numbers:

array([ 2.13316069e+02+0.00000000e+00j,  3.36520138e+02+4.05677438e+01j,...])

and if I naively plot it I see:

plt.plot(fft3)
plt.show()

fft naively plotted

Ok, not sure what to do with that.

I want to get from here to the datasets that look like sin1 and sin2:

plt.plot(sin1)
plt.show()

sin1 data plotted

plt.plot(sin2)
plt.show()

sin2 data plotted

I understand the real and imaginary part of the complex numbers in the fft3 dataset, I'm just not sure what to do with them to derive sin1 and sin2 datasets from it.

I know this has less to do with programming and more to do with math, but could anyone give me a hint here?

EDIT: update on Mark Snyder's answer:

Using Mark's code I was able to get what I expected and ended up with this method:

def decompose_fft(data: list, threshold: float = 0.0):
    fft3 = np.fft.fft(data)
    x = np.arange(0, 10, 10 / len(data))
    freqs = np.fft.fftfreq(len(x), .01)
    recomb = np.zeros((len(x),))
    for i in range(len(fft3)):
        if abs(fft3[i]) / len(x) > threshold:
            sinewave = (
                1 
                / len(x) 
                * (
                    fft3[i].real 
                    * np.cos(freqs[i] * 2 * np.pi * x) 
                    - fft3[i].imag 
                    * np.sin(freqs[i] * 2 * np.pi * x)))
            recomb += sinewave
            plt.plot(x, sinewave)
    plt.show()

    plt.plot(x, recomb, x, data)
    plt.show()

later I'll make it return the recombined list of waves, but for now I'm getting an anomalie I don't quite understand. First of all I call it like this, simply passing in a dataset.

decompose_fft(sin3, threshold=0.0)

But looks great but I get this weird line at y=0.2 Does anyone know what this could be or what's causing it?

Looks really good

EDIT:

The above question has been answered by Mark in the comments, thanks!

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
MetaStack
  • 3,266
  • 4
  • 30
  • 67
  • 1
    My bad, I think you need to plot them on a complex plane: https://www.tutorialspoint.com/How-to-Plot-Complex-Numbers-in-Python – Felipe Gutierrez Jan 14 '20 at 00:23
  • @FelipeGutierrez I'm unable to get much out of that, but either way, simply plotting it in the complex plane doesn't give me the datasets `sin1` or `sin2` (or tell me their frequencies, so I can derive them), right? This is the plot I attempted, looks cool, but doesn't tell me much. Unless I just don't know how to interpret it, of course: `plt.scatter(fft3.real, fft3.imag, color='red')` – MetaStack Jan 14 '20 at 00:30
  • 1
    I updated my answer with some code that might help you out. – Mark Snyder Jan 14 '20 at 20:56
  • By the way, the second argument of np.fft.fftfreq() should be the same as the third argument in np.arange(). – Mark Snyder Jan 15 '20 at 01:10
  • add parameter ith: int = 0 and modify the code and i > ith:, the weird line is gone – Anibal Yeh Jun 23 '22 at 14:13

2 Answers2

6

The discrete Fourier transform gives you the coefficients of complex exponentials that, when summed together, produce the original discrete signal. In particular, the k'th Fourier coefficient gives you information about the amplitude of the sinusoid that has k cycles over the given number of samples.

Note that since your sines don't have integer numbers of cycles in 1000 samples, you actually will not be able to retrieve your original sine waves using an FFT. Instead you'll get a blend of many different sinusoids, including a constant component of ~.4.

You can plot the various component sinusoids and observe that their sum is the original signal using the following code:

freqs = np.fft.fftfreq(len(x),.01)
threshold = 0.0
recomb = np.zeros((len(x),))
for i in range(len(fft3)):
    if abs(fft3[i])/(len(x)) > threshold:
        recomb += 1/(len(x))*(fft3[i].real*np.cos(freqs[i]*2*np.pi*x)-fft3[i].imag*np.sin(freqs[i]*2*np.pi*x))
        plt.plot(x,1/(len(x))*(fft3[i].real*np.cos(freqs[i]*2*np.pi*x)-fft3[i].imag*np.sin(freqs[i]*2*np.pi*x)))
plt.show()

plt.plot(x,recomb,x,sin3)
plt.show()

By changing threshold, you can also choose to exclude sinusoids of low power and see how that affects the final reconstruction.

EDIT: There's a bit of a trap in the above code, although it's not wrong. It hides the inherent symmetry of the DFT for real signals, and plots each of the sinusoids twice at half of their true amplitude. This code is more performant and plots the sinusoids at their correct amplitude:

import cmath

freqs = np.fft.fftfreq(len(x),.01)
threshold = 0.0
recomb = np.zeros((len(x),))
middle = len(x)//2 + 1
for i in range(middle):
    if abs(fft3[i])/(len(x)) > threshold:
        if i == 0:
            coeff = 2
        else:
            coeff = 1
        sinusoid = 1/(len(x)*coeff/2)*(abs(fft3[i])*np.cos(freqs[i]*2*np.pi*x+cmath.phase(fft3[i])))
        recomb += sinusoid
        plt.plot(x,sinusoid)
plt.show()

plt.plot(x,recomb,x,sin3)
plt.show()

If in the general case you know that the signal was composed of some subset of sinusoids with frequencies that may not line up correctly with the length of the signal, you may be able to identify the frequencies by either zero-padding or extending your signal. You can learn more about that here. If the signals are completely arbitrary and you're just interested in looking at component sinusoids, there's no need for that.

hamflow
  • 27
  • 6
Mark Snyder
  • 1,635
  • 3
  • 12
  • 1
    This is amazing! There's just one anomalie I don't understand, I've added an EDIT to the question explaining what I'm seeing – MetaStack Jan 15 '20 at 00:49
  • 3
    @LegitStack The line isn't an anomaly at all. It's a sinusoid with a frequency of 0 - that is to say, a constant! In fact, you should find that it's the mean of the data. Remember that each of the sinusoids (with frequency != 0) have a mean of 0. Without that straight line to hold the mean of the data, you couldn't find the Fourier transform of data that didn't have 0 mean. – Mark Snyder Jan 15 '20 at 01:06
3

There are some issues with discrete Fourier transform that are not immediately obvious from playing with its continuous counterpart. For one thing, the periodicity of your input should match the range of your data, so it's going to be much easier if you use:

x = np.linspace(0, 4*np.pi, 200)

You can then follow your original idea:

sin1 = np.sin(x)
sin2 = np.sin(2*x)
sin3 = sin1 + sin2
fft3 = np.fft.fft(sin3)

Since in FFT sin goes directly into the imaginary component, you can try plotting only the imaginary part:

plt.plot(fft3.imag)
plt.show()

What you should see will be peaks centered at x=2 and x=4 that correspond to the original sinusoidal components, which had frequencies of "2 per signal" (sin(x) from 0 to 4 pi) and "4 per signal" (sin(2x) from 0 to 4 pi).

To plot all individual components, you can go with:

for i in range(1,100):
  plt.plot(x, fft3.imag[i] * np.sin(i*x)/100)
plt.show()
  • Thanks for this, it's really great! I wonder if you could help me with a more general case. I was able to make what you suggested here work like a charm (actually I had to do `np.sin(i*x/2)`. But what about the case when the periodicity does not match the range of my data? What am I to do then, when I essentially don't have access to `x`? Even trying different mutations of `sin1` and `sin2` such as `sin2 = np.sin(1.9*x)` messed up the plots I was able to derive so that they didn't match the original. Can you help me understand and account for the more general case? – MetaStack Jan 14 '20 at 05:08
  • What you're roughly trying to do here is express `sin(1.9x)` as a sum of terms `sin(x)`, `sin(2x)`, `sin(3x)` etc (including also the corresponding cosine terms, since cos is orthogonal to sin). Obviously that can't be done with a single term, which is why you're seeing a number of contributions. It seems to me that the "clean" examples from lecture slides are mostly sums of sines plotted against explicitly defined components, not a DFT-based analysis of a real-life signal. ;) – Miłosz Wieczór Jan 14 '20 at 12:26
  • Thank you for this answer, it's been very educational, but unfortunately, it doesn't solve the general case as it is. I only created `sin1` and `sin2` in order to create `sin3` (a series of data points that cleanly represent a combination of a number of sine waves). In the real world application that this demo serves as a stepping stone to, I won't have `sin1` or `sin2` at the outset, I will only have a dataset analogous to `sin3` and will need to derive a constituent set of sine waves that approximates the dataset: given only the dataset. That is the general case I'm trying to solve for. – MetaStack Jan 14 '20 at 16:22
  • 1
    The issue here is that the "basic" frequency unit for DFT is the whole signal. So if you can fit exactly one period in the signal you're transforming, this will correspond to a frequency of one, and so on. What you want would only work for a continuous transform, as it uses a continuous spectrum of frequencies and any "pure" sine/cosine will yield a sharp peak. From a practical standpoint, though, my educated guess is that the more full periods you have in your signals, the better defined single-sine components you'll have - try comparing e.g. ranges from 0 to 4pi with ranges from 0 to 40 pi. – Miłosz Wieczór Jan 14 '20 at 20:18