3

I have the time series below

enter image description here

I want to check for cycles in order to remove them (as part of the usual pre-processing of time series), so I apply FFT.

# Number of samplepoints
N = len(y)
# sample spacing
T = 1.0 # 1 day
x = np.linspace(0.0, N*T, N)
yf = scipy.fftpack.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N/2)
components = 2.0/N * np.abs(yf[:N//2])

fig, ax = plt.subplots(1, 1, figsize=(10, 5))
ax.plot(xf, components)

This results in the following plot.

enter image description here

I want to remove the four greatest components. In order to do this I'm implementing the formula below.

enter image description here

max_components = sorted(components, reverse=True)[:4]
idx_max_comp = []

for comp in max_components:
    for i in range(len(components)):
        if components[i] == comp:
            idx_max_comp.append(i)
            break

cycle_signal = np.zeros(len(y))
for idx in idxs:
    a, b = (2.0/N) * np.real(yf[idx]), (2.0/N) * np.imag(yf[idx])
    fi = xf[idx]
    cycle_signal += (a * np.cos(2 * np.pi * fi * x)) + (b * np.sin(2 * np.pi * fi * x))

y = y - cycle_signal

But when I apply FFT again it's easy to see it didn't work.

enter image description here

Why?

Fernando Ferreira
  • 798
  • 1
  • 11
  • 26
Frias
  • 10,991
  • 9
  • 33
  • 40
  • Have you tried to use the statsmodel module? Like it is shown here http://stackoverflow.com/questions/20672236/time-series-decomposition-function-in-python? – Fernando Ferreira Feb 01 '17 at 02:13
  • Check this nice example I've found on another post: http://stackoverflow.com/questions/36968418/python-designing-a-time-series-filter-after-fourier-analysis – Fernando Ferreira Feb 01 '17 at 02:25
  • 1
    In the FFT array `yf` set the values corresponding to the peaks to zero. Since these peaks are not single points, you will have to set small ranges of values to zero. After that, just take the inverse FFT, and you will have your desired result. – Sci Prog Jul 21 '17 at 02:50
  • I am facing the same problem and I figured out the same solution. What I'd like to know, is there any mathematical way to figure out the bandwidths to remove, or is it done by eyeballing? – Fabio Capezzuoli May 17 '19 at 03:49

1 Answers1

0

I think the problem is the following:

T = 1.0 # 1 day

The sampling frequency is defined as the number of samples per second if you have one sample a day your sampling frequency is f = (1/24*60*60) which is approximately 11.57407 uHz (micro-Hertz) and your Nyquist frequency will be at 5.787035 uHz is approximately 2 days. This means that you can't check for occurrences of cycles more frequently than once every two days.

Dinidu Hewage
  • 2,169
  • 6
  • 40
  • 51
Egon
  • 1