3

My goal is to identify times (in datetime format below) at which there is a local maximum above a certain threshold. I appreciate there are other related responses which deal with numpy and scipy techniques for finding local maxima and minima, but none to my knowledge address a threshold level.

I have the following pandas.Series, represented as df_1, which stores integer values for given times:

t_min
2015-12-26 14:45:00      46
2015-12-26 14:46:00      25
2015-12-26 14:47:00      39
2015-12-26 14:48:00      58
2015-12-26 14:49:00      89
2015-12-26 14:50:00      60
2015-12-26 14:51:00      57
2015-12-26 14:52:00      60
2015-12-26 14:53:00      46
2015-12-26 14:54:00      31
2015-12-26 14:55:00      66
2015-12-26 14:56:00      78
2015-12-26 14:57:00      49
2015-12-26 14:58:00      47
2015-12-26 14:59:00      31
2015-12-26 15:00:00      55
2015-12-26 15:01:00      19
2015-12-26 15:02:00      10
2015-12-26 15:03:00      31
2015-12-26 15:04:00      36
2015-12-26 15:05:00      61
2015-12-26 15:06:00      29
2015-12-26 15:07:00      32
2015-12-26 15:08:00      49
2015-12-26 15:09:00      35
2015-12-26 15:10:00      17
2015-12-26 15:11:00      22

I use the following to deduce the array indexes at which local maxima occurs as per a response in another answer here:

x = np.array(df_1, dtype=np.float)

# for local maxima
print argrelextrema(x, np.greater)

However I wish to produce an array of the TIMES at which these maxima occur and not the integer (now converted to float) values at these indexes as I could find using x[argrelextrema(x, np.greater)[0]] – any idea how I could obtain an array of said times?

Proceeding this I also aim to refine this list of times by only selecting maxima above a certain threshold, i.e. whose slope is above a certain limit. This would allow me to avoid obtaining every single local maxima but instead to identify the most significant “peaks”. Would anyone have a suggestion on how to do this?

Community
  • 1
  • 1
DK99
  • 41
  • 6

3 Answers3

2

You can find the peaks by taking the difference between shifted x-arrays:

In [14]: x
Out[14]: 
array([ 46.,  25.,  39.,  58.,  89.,  60.,  57.,  60.,  46.,  31.,  66.,
        78.,  49.,  47.,  31.,  55.,  19.,  10.,  31.,  36.,  61.,  29.,
        32.,  49.,  35.,  17.,  22.])

In [15]: x[1:] - x[:-1]
Out[15]: 
array([-21.,  14.,  19.,  31., -29.,  -3.,   3., -14., -15.,  35.,  12.,
       -29.,  -2., -16.,  24., -36.,  -9.,  21.,   5.,  25., -32.,   3.,
        17., -14., -18.,   5.])

The values of x[1:] - x[:-1] give the "slope" between the values of x. By picking where this slope changes from positive to negative you can find out the indices of the peaks in your original array.

In [33]: slope = x[1:] - x[:-1]

In [34]: indices = [i+1 for i in range(len(slope)-1) if slope[i] > 0 and slope[i+1] < 0]

In [35]: indices
Out[35]: [4, 7, 11, 15, 20, 23]

In [36]: [x[j] for j in indices]
Out[36]: [89, 60, 78, 55, 61, 49]

I didn't bother to make a list of the time, but since you hace the indices...

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • Thanks for your answer. I am actually using the method I mentioned above `argrelextrema` from `scipy.signal` as follows: `x = np.array(df_1, dtype=np.float) # for local maxima print argrelextrema(x, np.greater)` which gives me the array [4, 7, 11, 15, 20, 23] however I am looking to return the times from the original data frame that these indices refer to. Would you have any idea how to do this? Thanks in advance. – DK99 Jan 03 '16 at 15:36
  • Make a list out of the first column in the dataframe (let's call that `d`), and index that: `[d[j] for j in indices]`. – Roland Smith Jan 03 '16 at 17:12
1

As of SciPy version 1.1, you can also use find_peaks:

import numpy as np                    
import matplotlib.pyplot as plt
from scipy.signal import find_peaks 

x = np.array([ 46.,  25.,  39.,  58.,  89.,  60.,  57.,  60.,  46.,  31.,  66.,
        78.,  49.,  47.,  31.,  55.,  19.,  10.,  31.,  36.,  61.,  29.,
        32.,  49.,  35.,  17.,  22.])

peaks, _ = find_peaks(x)

plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.show()

This will plot will get all local maxima:

enter image description here

If you now want to use a threshold (e.g. 60), you can use (rest of the code identical):

peaks, _ = find_peaks(x, height=60)

This will plot:

enter image description here

Cleb
  • 25,102
  • 20
  • 116
  • 151
0

If I understand it correctly, all you need to do after you used the argrelextrema is to apply these indices to the times. Provided your initial snippet:

x = np.array(df_1, dtype=np.float)

# for local maxima
print argrelextrema(x, np.greater)

All you need to do is modify it like this:

indices = argrelextrema(x, np.greater)
df_1['time'] = df_1.index # to turn your times into a column of a dataframe - they are currently in index, right?

# So your solution is this:
print df_1.ix[indices[0], 'time']  # the [0] is there because argrelextrema returns a tuple of the array of indices and dtype, so the first item of the tuple are the indices themselves
durbachit
  • 4,626
  • 10
  • 36
  • 49