6

I am using Python to perform a Fast Fourier Transform on some data. I then need to extract the locations of the peaks in the transform in the form of the x-values. Right now I am using Scipy's fft tool to perform the transform, which seems to be working. However, when i use Scipy's find_peaks I only get the y-values, not the x-position that I need. I also get the warning:

ComplexWarning: Casting complex values to real discards the imaginary part

Is there a better way for me to do this? Here is my code at the moment:

import pandas as pd
import matplotlib.pyplot as plt
from scipy.fft import fft
from scipy.signal import find_peaks

headers = ["X","Y"]

original_data = pd.read_csv("testdata.csv",names=headers)

x = original_data["X"]
y = original_data["Y"]

a = fft(y)
peaks = find_peaks(a)
print(peaks)

plt.plot(x,a)
plt.title("Fast Fourier transform")
plt.xlabel("Frequency")
plt.ylabel("Amplitude")
plt.show()
MUD
  • 121
  • 1
  • 13

2 Answers2

7

There seem to be two points of confusion here:

  1. What find_peaks is returning.
  2. How to interpret complex values that the FFT is returning.

I will answer them separately.

Point #1

find_peaks returns the indices in "a" that correspond to peaks, so I believe they ARE values you seek, however you must plot them differently. You can see the first example from the documentation here. But basically "peaks" is the index, or x value, and a[peaks] will be the y value. So to plot all your frequencies, and mark the peaks you could do:

plt.plot(a)
plt.plot(peaks, a[peaks])

Point #2

As for the second point, you should probably read up more on the output of FFTs, this post is a short summary but you may need more background to understand it.

But basically, an FFT will return an array of complex numbers, which contains both phase and magnitude information. What you are currently doing is implicitly only looking at the real part of the solution (hence the warning that the imaginary portion is being discarded), what you probably want instead to take the magnitude of your "a" array, but without more information on your application it is impossible to say.

sgillen
  • 596
  • 2
  • 7
7

I tried to put as much details as possible:

import pandas as pd
import matplotlib.pyplot as plt
from scipy.fft import fft, fftfreq
from scipy.signal import find_peaks

# First: Let's generate a dummy dataframe with X,Y
# The signal consists in 3 cosine signals with noise added. We terminate by creating
# a pandas dataframe.

import numpy as np
X=np.arange(start=0,stop=20,step=0.01) # 20 seconds long signal sampled every 0.01[s]

# Signal components given by [frequency, phase shift, Amplitude]
GeneratedSignal=np.array([[5.50, 1.60, 1.0], [10.2, 0.25, 0.5], [18.3, 0.70, 0.2]])

Y=np.zeros(len(X))
# Let's add the components one by one
for P in GeneratedSignal:
    Y+=np.cos(2*np.pi*P[0]*X-P[1])*P[2] 

# Let's add some gaussian random noise (mu=0, sigma=noise):
noise=0.5
Y+=np.random.randn(len(X))*noise

# Let's build the dataframe:
dummy_data=pd.DataFrame({'X':X,'Y':Y})
print('Dummy dataframe: ')
print(dummy_data.head())

# Figure-1: The dummy data

plt.plot(X,Y)
plt.title('Dummy data')
plt.xlabel('time [s]')
plt.ylabel('Amplitude')
plt.show()

# ----------------------------------------------------
# Processing:

headers = ["X","Y"]

#original_data = pd.read_csv("testdata.csv",names=headers)
# Let's take our dummy data:

original_data = dummy_data

x = np.array(original_data["X"])
y = np.array(original_data["Y"])


# Assuming the time step is constant:
# (otherwise you'll need to resample the data at a constant rate).
dt=x[1]-x[0]  # time step of the data

# The fourier transform of y:
yf=fft(y, norm='forward')  
# Note: see  help(fft) --> norm. I chose 'forward' because it gives the amplitudes we put in.
# Otherwise, by default, yf will be scaled by a factor of n: the number of points


# The frequency scale
n = x.size   # The number of points in the data
freq = fftfreq(n, d=dt)

# Let's find the peaks with height_threshold >=0.05
# Note: We use the magnitude (i.e the absolute value) of the Fourier transform

height_threshold=0.05 # We need a threshold. 


# peaks_index contains the indices in x that correspond to peaks:

peaks_index, properties = find_peaks(np.abs(yf), height=height_threshold)

# Notes: 
# 1) peaks_index does not contain the frequency values but indices
# 2) In this case, properties will contain only one property: 'peak_heights'
#    for each element in peaks_index (See help(find_peaks) )

# Let's first output the result to the terminal window:
print('Positions and magnitude of frequency peaks:')
[print("%4.4f    \t %3.4f" %(freq[peaks_index[i]], properties['peak_heights'][i])) for i in range(len(peaks_index))]


# Figure-2: The frequencies

plt.plot(freq, np.abs(yf),'-', freq[peaks_index],properties['peak_heights'],'x')
plt.xlabel("Frequency")
plt.ylabel("Amplitude")
plt.show()

The terminal output:

Dummy dataframe: 
      X         Y
0  0.00  0.611829
1  0.01  0.723775
2  0.02  0.768813
3  0.03  0.798328

Positions and magnitude of frequency peaks:
5.5000       0.4980
10.2000      0.2575
18.3000      0.0999
-18.3000     0.0999
-10.2000     0.2575
-5.5000      0.4980

NOTE: Since the signal is real-valued, each frequency component will have a "double" that is negative (this is a property of the Fourier transform). This also explains why the amplitudes are half of those we gave at the beginning. But if, for a particular frequency, we add the amplitudes for the negative and positive components, we get the original amplitude of the real-valued signal.

For further exploration: You can change the length of the signal to 1 [s] (at the beginning of the script):

X=np.arange(start=0,stop=1,step=0.01) # 1 seconds long signal sampled every 0.01[s]

Since the length of the signal is now reduced, the frequencies are less well defined (the peaks have now a width) So, add: width=0 to the line containing the find_peaks instruction:

peaks_index, properties = find_peaks(np.abs(yf), height=height_threshold, width=0)

Then look at what is contained inside properties:

print(properties)

You'll see that find_peaks gives you much more informations than just the peaks positions. For more info about what is inside properties: help(find_peaks)

Figures:

Figure-1

Figure-2

S_Bersier
  • 1,156
  • 4
  • 5