2

Noob data analyst, analyzing some gas concentrations over a timeseries of a couple of thousand points (so small). I graphed it with Matplotlib, and there are some easy to see points where things change rapidly.

What is the canonical / easiest way to home in on those points?

Dirk
  • 3,073
  • 4
  • 31
  • 36
  • 2
    Do you mean comparing values against previous value? `diff()` shows the difference between previous rows if that's any help – EdChum Feb 19 '15 at 21:52
  • I am comparing values to earlier values in the time series. Say comparing n with n-10. – Dirk Feb 20 '15 at 01:06
  • Yeah, like Ed said. Check out diff(). Maybe filter on the bigger values to slim down what you're looking at. There's also rolling_mean that could help identify more sustained spikes – Bob Haffner Feb 20 '15 at 03:12
  • 2
    like Bob said, rolling_mean of the diff, and I'd spend some time with the window size for rolling_mean while deciding what I meant by "rapidly". – cphlewis Feb 20 '15 at 03:24

1 Answers1

2
import pandas as pd
from numpy import diff, concatenate
ff = pd.DataFrame( #acquire data here
      columns=('Year','Recon'))
fd = diff(ff['Recon'], axis=-1)
ff['diff'] = concatenate([[0],fd],axis=0)
ff['rolling10'] = pd.rolling_mean(ff['diff'],10)
ff['rolling5'] = pd.rolling_mean(ff['diff'],5)
ff.plot('Year',['rolling5','rolling10'],subplots=False)

But note! my test data was evenly sampled. Looks like rolling_* don't apply to irregular time series yet, though there are some workarounds: Pandas: rolling mean by time interval

Community
  • 1
  • 1
cphlewis
  • 15,759
  • 4
  • 46
  • 55