creating an upper envelope for broadband, noisy signal using pandas

Question

I'm using Python 2.7. The title provides context. I phrased the title in this specific way so people can query for this stack exchange question in the future. There is a plethora of documentation for this stuff using MATLAB, but this process is severely lacking for Scipy, NumPy, Pandas, matplotlib, etc.

Essentially, I have the following dataframe:

   time amplitude
 0 1.0  0.1
 1 2.0 -0.3
 2 3.0  1.4
 3 4.0  4.2
 4 5.0  -5.7
 5 6.0  2.3
 6 7.0  -0.2
 7 8.0  -0.3
 8 9.0  1.0
 9 10.0  0.1

Now what I want to do is the following:

in 5 second intervals, look for the max and min value
record max and min value with the corresponding time value (i.e. for the above case, in the first 5 seconds, the max is 4.2 at 4 seconds and -5.7 at 5 seconds)

append values in appropriate place into the data frame i.e.

time amplitude upper lower
0 1.0  0.1       
1 2.0 -0.3
2 3.0  1.4
3 4.0  4.2       4.2
4 5.0  -5.7            -5.7
5 6.0  2.3       2.3
6 7.0  -0.8            -0.8
7 8.0  -0.3
8 9.0   1.0
9 10.0  0.1

interpolate between max values and min values to flush out dataframe
plot amplitude column, upper column and lower column

I'm familiar enough with python/pandas and imagine the code looking something like the following:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy as scipy

time = [0,1,2,3,4,5,6,7,8,9]
amplitude = [0.1,-0.3,1.4,4.2,-5.7,2.3,-0.2,-0.3,1.0,0.1]
df = pd.DataFrame({'time': time, 'amplitude': amplitude}]
plt.plot(df['time'],df['amplitude])

for seconds in time:
    if <interval == 5>:
        max = []
        time_max = []
        min = []
        time_min = []

        max.append(df.max['amplitude'])
        min.append(df.min['amplitude'])
        time_max.append(<time value in interval>)
        time_min.append(<time value in interval>)

  <build another dataframe>
  <concat to existing dataframe df>
  <interpolate between values in column 'upper'>
  <interpolate between values in column 'lower'>

any help is appreciated.

thank you.

~devin

The `amplitude` value in row 6 is different in the first and second display of your example data (`-0.2` vs `-0.8`). Which is it? — andrew_reece, May 14 '17 at 19:35
the specifics are very irrelevant. i answered my own question btw — Devin Liner, May 14 '17 at 19:57

score 0 · Accepted Answer · answered May 14 '17 at 20:05

Pandas resample() and interpolate() will help here. To get seconds as a DatetimeIndex, start with an arbitrary Datetime - you can always chop off the year when you're done:

df.set_index(pd.to_datetime("2017") + df.time * pd.offsets.Second(), inplace=True)

print(df)
                     time  amplitude
time                                
2017-01-01 00:00:01   1.0        0.1
2017-01-01 00:00:02   2.0       -0.3
2017-01-01 00:00:03   3.0        1.4
2017-01-01 00:00:04   4.0        4.2
2017-01-01 00:00:05   5.0       -5.7
2017-01-01 00:00:06   6.0        2.3
2017-01-01 00:00:07   7.0       -0.2
2017-01-01 00:00:08   8.0       -0.3
2017-01-01 00:00:09   9.0        1.0
2017-01-01 00:00:10  10.0        0.1

Resample every 5 seconds, and get summary statistics min and max:

summary = (df.resample('5S', label='right', closed='right')
             .agg({"amplitude":{"lower":"min","upper":"max"}}))
summary.columns = summary.columns.droplevel(0)

print(summary)
                     upper  lower
time                             
2017-01-01 00:00:05    4.2   -5.7
2017-01-01 00:00:10    2.3   -0.3

Merge with original df and interpolate missing values. (Note that interpolation is only possible between two values, so the first few entries will be NaN.)

df2 = df.merge(summary, how='left', left_index=True, right_index=True)
df2.lower.interpolate(inplace=True) 
df2.upper.interpolate(inplace=True) 

print(df2)
                     time  amplitude  upper  lower
time                                              
2017-01-01 00:00:01   1.0        0.1    NaN    NaN
2017-01-01 00:00:02   2.0       -0.3    NaN    NaN
2017-01-01 00:00:03   3.0        1.4    NaN    NaN
2017-01-01 00:00:04   4.0        4.2    NaN    NaN
2017-01-01 00:00:05   5.0       -5.7   4.20  -5.70
2017-01-01 00:00:06   6.0        2.3   3.82  -4.62
2017-01-01 00:00:07   7.0       -0.2   3.44  -3.54
2017-01-01 00:00:08   8.0       -0.3   3.06  -2.46
2017-01-01 00:00:09   9.0        1.0   2.68  -1.38
2017-01-01 00:00:10  10.0        0.1   2.30  -0.30

Finally, plot the output:

plot_cols = ['amplitude','lower','upper']
df2[plot_cols].plot()

Note: If you want the index to only display seconds, just use:

df2.index = df2.index.second

i answered my question differently (see post), but your solution is more pythonic. thank you!!! — Devin Liner, May 14 '17 at 20:18

score 0 · Answer 2 · edited May 23 '17 at 12:26

pretty much copied this: python - How to get high and low envelope of a signal? (but in a more pandas/dataframe oriented way)
used this as well: Subsetting Data Frame into Multiple Data Frames in Pandas
and I ran into this problem for the very first time:

I hope this helps people create arbitrary envelopes for noisy signals / time series data like it helped me!!!!

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy as scipy

time_array = [0,1,2,3,4,5,6,7,8,9]
value_array = [0.1,-0.3,1.4,4.2,-5.7,2.3,-0.2,-0.3,1.0,0.1]

upper_time = []
upper_value = []
lower_time = []
lower_value = []

df = pd.DataFrame({'time': time_array, 'value': value_array})


for element,df_k in df.groupby(lambda x: x/2):
    df_temp = df_k.reset_index(drop=True)

    upper_time.append(df_temp['time'].loc[df_temp['value'].idxmax()])
    upper_value_raw = df_temp['value'].loc[df_temp['value'].idxmax()]
    upper_value.append(round(upper_value_raw,1))

    lower_time.append(df_temp['time'].loc[df_temp['value'].idxmin()])
    lower_value_raw = df_temp['value'].loc[df_temp['value'].idxmin()]
    lower_value.append(round(lower_value_raw,1))



plt.plot(df['time'],df['value'])
plt.plot(upper_time,upper_value)
plt.plot(lower_time,lower_value)
plt.show()

note**** i made this envelope by looking at 2 second intervals on accident. oops. just change df.groupby(lambda x: x/2) to df.groupby(lambda x: x/5) — Devin Liner, May 14 '17 at 20:19

creating an upper envelope for broadband, noisy signal using pandas

2 Answers2