How can I split pandas dataframe into groups of peaks

Question

I have a dataset in an excel file I'm trying to analyse.

Example data:

Time in s  Displacement in mm    Force in N
          0                   0  Not Relevant
          1                   1  Not Relevant
          2                   2  Not Relevant
          3                   3  Not Relevant
          4                   2  Not Relevant
          5                   1  Not Relevant
          6                   0  Not Relevant
          7                   2  Not Relevant
          8                   3  Not Relevant
          9                   4  Not Relevant
         10                   5  Not Relevant
         11                   6  Not Relevant
         12                   5  Not Relevant
         13                   4  Not Relevant
         14                   3  Not Relevant
         15                   2  Not Relevant
         16                   1  Not Relevant
         17                   0  Not Relevant
         18                   4  Not Relevant
         19                   5  Not Relevant
         20                   6  Not Relevant
         21                   7  Not Relevant
         22                   6  Not Relevant
         23                   5  Not Relevant
         24                   4  Not Relevant
         24                   0  Not Relevant

Imported from an xls file and then plotting a graph of time vs displacement:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel(
    'DATA.xls',
    engine='xlrd', usecols=['Time in s', 'Displacement in mm', 'Force in N'])


fig, ax = plt.subplots()
ax.plot(df['Time in s'], df['Displacement in mm'])

ax.set(xlabel='Time (s)', ylabel='Disp',
       title='time disp')
ax.grid()

fig.savefig("time_disp.png")
plt.show()

I'd like to split the data into multiple groups to analyse separately. So if I plot displacement against time, I get a sawtooth as a sample is being cyclically loaded.

I'd like to split the data so that each "tooth" is its own group or dataset so I can analyse each cycle

Can anyone help?

Always provide a complete [mre] with code, **data, errors, current output, and expected output**, as **[formatted text](https://stackoverflow.com/help/formatting)**. If relevant, only plot images are okay. Please see [How to ask a good question](https://stackoverflow.com/help/how-to-ask). Provide data with [How to provide a reproducible copy of your DataFrame using `df.head(15).to_clipboard(sep=',')`](https://stackoverflow.com/questions/52413246), then **[edit] your question**, and paste the clipboard into a code block. — Trenton McKinney, Feb 05 '21 at 19:41
In your real data, Displacement in mm is exactly 0 when you start a new cycle? or it is a local minimum? — Ben.T, Feb 05 '21 at 20:34

score 0 · Accepted Answer · answered Feb 05 '21 at 21:28

you can create a column group with a value changing at each local minimum. First get True at a local minimum and use two diff once forward and once backward. Then use cumsum to increase the group number each time a local minimum is.

df['gr'] = (~(df['Deplacement'].diff(1)>0) 
             & ~(df['Deplacement'].diff(-1)>0)).cumsum()
print(df)
    Time  Deplacement  gr
0      0            0   1
1      1            1   1
2      2            2   1
3      3            3   1
4      4            2   1
5      5            1   1
6      6            0   2
7      7            2   2
8      8            3   2
9      9            4   2
10    10            5   2
11    11            6   2
12    12            5   2
13    13            4   2
14    14            3   2
15    15            2   2
16    16            1   2
17    17            0   3
18    18            4   3
19    19            5   3

you can split the data by selecting each group individually, or you could do something with a loop and do anything you want in each loop.

s = (~(df['Deplacement'].diff(1)>0) 
      & ~(df['Deplacement'].diff(-1)>0)).cumsum()
for _, dfg in df.groupby(s):
    print(dfg)
    # analyze as needed

Edit: in the case of the data in your question with 0 as a minimum, then doing df['gr'] = df['Deplacement'].eq(0).cumsum() would work as well, but it is specific to minimum being exactly 0

I'm having trouble understanding the logic of the code you suggested. It works very well for the application, I was just experimenting and wanted to see what my data would look like if I could also split it into minimum-maximum, maximum-minimum. So basically split each "tooth" of the sawtooth into the ascending part and the descending part — frogbutt, Feb 09 '21 at 14:44
@frogbutt so a local minimum is when a point is less than the previous and less than the next one hence the use of `diff`. I do `>0` but the use `~` reverse the result so `~...>0` would be equivalent to `...<0`. but I understand that I over complicated the answer to avoid boundaries effect and it was not really necessary. For split each tooth, try like this `df['gr'] = ( ( (df['Deplacement'].diff(1)<0)&(df['Deplacement'].diff(-1)<0)) |( (df['Deplacement'].diff(1)>0)&(df['Deplacement'].diff(-1)>0)) ).cumsum()`. Let me know if you need more explanation, I'll try to edit the answer :) — Ben.T, Feb 09 '21 at 16:51

How can I split pandas dataframe into groups of peaks

1 Answers1