8

I have a pandas.DataFrame indexed by time, as seen below. The other column contains data recorded from a device measuring current. I want to filter to the second column by a low pass filter with a frequency of 5Hz to eliminate high frequency noise. I want to return a dataframe, but I do not mind if it changes type for the application of the filter (numpy array, etc.).

In [18]: print df.head()
Time
1.48104E+12    1.1185
1.48104E+12    0.8168
1.48104E+12    0.8168
1.48104E+12    0.8168
1.48104E+12    0.8168

I am graphing this data by df.plot(legend=True, use_index=False, color='red') but would like to graph the filtered data instead.

I am using pandas 0.18.1 but I can change.

I have visited https://oceanpython.org/2013/03/11/signal-filtering-butterworth-filter/ and many other sources of similar approaches.

Andrew
  • 115
  • 1
  • 6
  • 1
    *"Occassionally there may be an empty cell in the second column."* If you want to use any of the scipy signal processing functions, you'll have decide how you want to deal with these missing values first. (The process of replacing missing values is known as [imputation](https://en.wikipedia.org/wiki/Imputation_(statistics)) in statistics.) – Warren Weckesser Feb 20 '17 at 18:56
  • Here's another Stackoverflow Q&A that you might find helpful: http://stackoverflow.com/questions/25191620/creating-lowpass-filter-in-scipy-understanding-methods-and-units – Warren Weckesser Feb 21 '17 at 17:31
  • It seems like the pandas equivalent of `scipy.decimate` (LPF then decimate) is to first apply `.rolling` (to LPF the signal with a window function FIR filter) and then apply `.resample`? But it can be extremely slow. There may be a better way I'm missing. – endolith Dec 01 '22 at 16:55

1 Answers1

0

Perhaps I am over-simplifying this but you create a simple condition, create a new dataframe with the filter, and then create your graph from the new dataframe. Basically just reducing the dataframe to only the records that meet the condition. I admit I do not know what the exact number is for high frequency, but let's assume your second column name is "Frequency"

condition = df["Frequency"] < 1.0
low_pass_df = df[condition]
low_pass_df.plot(legend=True, use_index=False, color='red')
git_rekt
  • 54
  • 3
  • This is not relevant to the question, which is about digital filtering of a time series or digital signal. https://en.wikipedia.org/wiki/Digital_filter Not the type of "filtering" used in pandas, which selects rows based on criteria. – endolith Dec 01 '22 at 16:50