Fourier series data fit with numpy: fft vs coding

Question

Suppose I have some data, y, to which I would like to fit a Fourier series. On this post, a solution was posted by Mermoz using the complex format of the series and "calculating the coefficient with a riemann sum". On this other post, the series is obtained through the FFT and an example is written down.

I tried implementing both approaches (image and code below - notice everytime the code is run, different data will be generated due to the use of numpy.random.normal) but I wonder why I am getting different results - the Riemann approach seems "wrongly shifted" while the FFT approach seems "squeezed". I am also not sure about my definition of the period "tau" for the series. I appreciate the attention.

I am using Spyder with Python 3.7.1 on Windows 7

Example

import matplotlib.pyplot as plt
import numpy as np

# Assume x (independent variable) and y are the data.
# Arbitrary numerical values for question purposes:
start = 0
stop = 4
mean = 1
sigma = 2
N = 200
terms = 30 # number of terms for the Fourier series

x = np.linspace(start,stop,N,endpoint=True) 
y = np.random.normal(mean, sigma, len(x))

# Fourier series
tau = (max(x)-min(x)) # assume that signal length = 1 period (tau)

# From ref 1
def cn(n):
    c = y*np.exp(-1j*2*n*np.pi*x/tau)
    return c.sum()/c.size
def f(x, Nh):
    f = np.array([2*cn(i)*np.exp(1j*2*i*np.pi*x/tau) for i in range(1,Nh+1)])
    return f.sum()
y_Fourier_1 = np.array([f(t,terms).real for t in x])

# From ref 2
Y = np.fft.fft(y)
np.put(Y, range(terms+1, len(y)), 0.0) # zero-ing coefficients above "terms"
y_Fourier_2 = np.fft.ifft(Y)

# Visualization
f, ax = plt.subplots()
ax.plot(x,y, color='lightblue', label = 'artificial data')
ax.plot(x, y_Fourier_1, label = ("'Riemann' series fit (%d terms)" % terms))
ax.plot(x,y_Fourier_2, label = ("'FFT' series fit (%d terms)" % terms))
ax.grid(True, color='dimgray', linestyle='--', linewidth=0.5)
ax.set_axisbelow(True)
ax.set_ylabel('y')
ax.set_xlabel('x')
ax.legend()

In case anyone else ends up here having similar headaches; the expression for `f` might seem a bit strange because of the `2` before `cn(i)` multiplying the whole expression. I believe this was a "shortcut" used by the author of Ref.1 to account for the _negative frequencies_ , because normally the series is found written without this `2` and in a symmetric range - so that the imaginary terms of the complex numbers even out with their complex conjugate, and we end up with an entirely real function (well, here there will be some numerical left-over errors of the imaginary part). — andrerud, Mar 07 '19 at 10:06
Thus, if we write `f = np.array([cn(i)*np.exp(1j*2*i*np.pi*x/tau) for i in range(-Nh,Nh+1)])` ,the "wrong shift" mentioned in my question disappears and the approximation is what I was expecting, which I take as correct. However, I was not able to get the FFT approach to provide the same result. — andrerud, Mar 07 '19 at 10:07

francis · Accepted Answer · 2019-02-12T18:27:17.550

3

Performing two small modifications is sufficient to make the sums nearly similar to the output of np.fft. The FFTW library indeed computes these sums.

1) The average of the signal, c[0] is to be accounted for:

f = np.array([2*cn(i)*np.exp(1j*2*i*np.pi*x/tau) for i in range(0,Nh+1)]) # here : 0, not 1

2) The output must be scaled.

y_Fourier_1=y_Fourier_1*0.5

The output seems "squeezed" because the high frequency components have been filtered. Indeed, the high frequency oscillations of the input have been cleared and the output looks like a moving average.

Here, tau is actually defined as stop-start: it corresponds to the length of the frame. It is the expected period of the signal.

If the frame does not correspond to a period of the signal, you can guess its period by convoluting the signal with itself and finding the first maximum. See Find period of a signal out of the FFT Nevertheless, it is unlikely to work properly with a dataset generated by numpy.random.normal : this is an Additive White Gaussian Noise. As it features a constant power spectral density, it can hardly be discribed as periodic!

edited Feb 12 '19 at 18:27

answered Feb 11 '19 at 22:13

francis

9,525
2
25
41

Hi @francis, thanks a lot for the reply and remarks! I just ask for two small clarifications on your reply: 1) when you say the "high frequency components have been filtered", are you referring to the limited amount of terms the code is using for the series? 2) regarding "the peaks of the input have been cleared", I am not sure I understand what you mean, all the data of _y_ is being taken into account for the fit, no? – andrerud Feb 12 '19 at 07:48
Hi again, I was doing some more tests and I realized that if I account for the average of the signal in the following way, I get what I was expecting from the Riemann approach: `y_Fourier_1 = np.array([f(t,terms).real for t in x]) + np.mean(y)` leaving the range in the `f` function as 1. In other words, I use the half of only the c[0] term instead of halving the whole series as you wrote on your output scaling comment. Now the Riemann approach seems correctly scaled/shifted, but I still wonder why the FFT approach is "squeezed" in comparison to it. – andrerud Feb 12 '19 at 09:20
Thanks for your feedback. 1) Exactly ! The coefficient `Y[i]` corresponds to the frequency i/T, where T is the period. Consequently, keeping the first `terms` or zeroing coefficients by doing `np.put(Y, range(terms+1, len(y)), 0.0)` discards all high frequency components. Hence, the high frequency oscillations are removed from the input signal. 2) Yes, all the points of the input are accounted for. By performing FFT, the signal is written as a weighted sum of sine waves. Only low-frequency sine waves are kept. The word `peak` is missleading: it is to changed to `high frequency oscillations` – francis Feb 12 '19 at 17:59
1

Indeed, the frequency Y[0], or `cn(0)`, corresponds to the average of the input signal: all items of `np.exp(-1j*2*n*np.pi*x/tau)` are equal to 1 as n==0. Since a white noise is considered, the high frequency oscillations feature large magnitudes. As the high frequency oscillations are removed, the signal look like squeezed. – francis Feb 12 '19 at 18:14
Applying the discrete version of [Parseval's theorem](https://en.wikipedia.org/wiki/Parseval%27s_theorem), you can minimize the L2 norm between the compressed signal and the input signal by zeroing the coefficients featuring the smallest magnitude, although the indexes of non-zeroed frequencies must also be stored. Nevertheless, the white noise is almost the worse case for approches consisting in zeroing selected frequencies: the magnitudes of frequencies are all nearly similar! – francis Feb 12 '19 at 18:25
Hi again. I realized something is unclear for me: when you comment that by performing the FFT, the signal is written as a weighted sum of _sine_ waves, and only the sine waves, I do not understand why. Are the cosine terms not present? When I look at the "explicit" way of calculating it (summing each term of the complex series), only the real part is used; checking the conversion Complex Fourier Coefficients → Fourier Coefficients, the "real part" is associated with the coefficients of the cosine terms. What am I missing? – andrerud Mar 06 '19 at 09:39
1

Yes, you're right. The DFT actually writes the real function as a sum of complex exponentials. The real part of exp(2pi i k x/T) corresponds to the cosine of frequency k/T, and the imaginary part corresponds the the sine of frequency k/T. I wrote that the function is written as sum of [sine waves](https://en.wikipedia.org/wiki/Sine_wave) because a cosine is a sine wave of phase pi/2. The cosine terms are present. – francis Mar 06 '19 at 12:31

Fourier series data fit with numpy: fft vs coding

1 Answers1

Linked