use time series data in python to calculate mean, variance std deviation

Question

I have data collected from sensors that looks like:

sec   nanosec value 

1001   1       0.2 

1001   2       0.2

1001   3       0.2 

1002   1       0.1  

1002   2       0.2   

1002   3       0.1 

1003   1       0.2 

1003   2       0.2

1003   3       0.1  

1004   1       0.2   

1004   2       0.2 

1004   3       0.2 

1004   4      0.1

I want to calculate average,std deviation and some other stats like maximum, minimum for a column every 2 seconds. so average for (1001, 1002)= 0.167, average of (1003,1004)=0.17

From the tutorials http://earthpy.org/pandas-basics.html, I think I should convert it to time series and the use rolling _means from pandas, but I am new to time series data so I am not sure if that is the correct way. Also how do I specify frequency here for conversion as observations for the first second have less observations. So for actual data I have less than 100 readings for 1001 second and then 100 observations for 1002 second onwards.

I could also do a simple groupby on seconds but it would group readings per second and not every 2 seconds, then how could i combine observations for 2 consecutive groups from groupby and then do analysis.

jezrael · Accepted Answer · 2016-07-14T19:03:47.477

I think you can first convert column sec to_timedelta, set_index and resample by 2 seconds (2S):

df['sec'] = pd.to_timedelta(df.sec, unit='s')
df.set_index('sec', inplace=True)
print (df)
          nanosec  value
sec                     
00:16:41        1    0.2
00:16:41        2    0.2
00:16:41        3    0.2
00:16:42        1    0.1
00:16:42        2    0.2
00:16:42        3    0.1
00:16:43        1    0.2
00:16:43        2    0.2
00:16:43        3    0.1
00:16:44        1    0.2
00:16:44        2    0.2
00:16:44        3    0.2
00:16:44        4    0.1

print (df.value.resample('2S').mean())
sec
00:16:41    0.166667
00:16:43    0.171429
00:16:45         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S').std())
sec
00:16:41    0.051640
00:16:43    0.048795
00:16:45         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S').max())
sec
00:16:41    0.2
00:16:43    0.2
00:16:45    NaN
Freq: 2S, Name: value, dtype: float64

Maybe you need change base in resample:

print (df.value.resample('2S', base=1).mean())
sec
00:16:42    0.166667
00:16:44    0.171429
00:16:46         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=1).std())
sec
00:16:42    0.051640
00:16:44    0.048795
00:16:46         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=1).max())
sec
00:16:42    0.2
00:16:44    0.2
00:16:46    NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=2).mean())
sec
00:16:43    0.166667
00:16:45    0.171429
00:16:47         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=2).std())
sec
00:16:43    0.051640
00:16:45    0.048795
00:16:47         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=2).max())
sec
00:16:43    0.2
00:16:45    0.2
00:16:47    NaN
Freq: 2S, Name: value, dtype: float64

I thought this would work but I get a warning Try using .loc[row_index,col_indexer] = value instead df1['header_stamp_secs'] = pd.to_timedelta(df1.header_stamp_secs, unit='s'). This is followed by an error Only valid with DatetimeIndex or PeriodIndex — AMisra, Jul 15 '16 at 16:54
Hmmm, last version is `0.18.1`, I think you can upgrade `pandas`. — jezrael, Jul 15 '16 at 17:01

score 1 · Answer 2 · answered Jul 14 '16 at 19:14

Borrowing jezrael's code to setup things up:

df['sec'] = pd.to_timedelta(df.sec, unit='s')
df.set_index('sec', inplace=True)
print (df)
          nanosec  value
sec                     
00:16:41        1    0.2
00:16:41        2    0.2
00:16:41        3    0.2
00:16:42        1    0.1
00:16:42        2    0.2
00:16:42        3    0.1
00:16:43        1    0.2
00:16:43        2    0.2
00:16:43        3    0.1
00:16:44        1    0.2
00:16:44        2    0.2
00:16:44        3    0.2
00:16:44        4    0.1

Use pd.TimeGrouper('2S') and describe()

df.groupby(pd.TimeGrouper('2S')).describe()

use time series data in python to calculate mean, variance std deviation

2 Answers2