i have a large data set that contains time and pressure data. the pressure is affected by the operation of a pump. i want to get the areas in the data where the pump was operating. i tried filtering based on pressure value since the pressure rises very quickly when the pump starts using the code:
Time Count Pressure [KPa]
300 09:49:52 575.54 36.05
301 09:49:54 577.46 36.07
302 09:49:56 579.38 36.11
303 09:49:58 581.30 36.16
304 09:50:00 583.22 36.03
305 09:50:02 585.14 36.09
306 09:50:04 587.05 36.05
307 09:50:06 588.97 36.07
308 09:50:08 590.89 36.16
309 09:50:10 592.81 36.16
310 09:50:12 594.73 36.11
311 09:50:14 596.65 36.22
312 09:50:15 598.57 36.15
313 09:50:17 600.48 36.15
314 09:50:19 602.40 36.16
315 09:50:21 604.32 36.18
316 09:50:23 606.24 36.18
317 09:50:25 608.16 36.18
318 09:50:27 610.08 36.24
319 09:50:29 612.00 35.26
320 09:50:31 613.91 33.65
321 09:50:33 615.83 32.23
322 09:50:35 617.75 30.76
323 09:50:37 619.67 29.55
324 09:50:38 621.59 28.17
325 09:50:40 623.51 26.96
326 09:50:42 625.42 26.05
327 09:50:44 627.34 25.12
328 09:50:46 629.26 24.35
329 09:50:48 631.18 23.78
330 09:50:50 633.10 23.08
331 09:50:52 635.02 22.08
332 09:50:54 636.94 21.85
333 09:50:56 638.85 21.19
334 09:50:58 640.77 20.85
335 09:51:00 642.69 20.15
336 09:51:02 644.61 20.10
337 09:51:03 646.53 19.36
338 09:51:05 648.45 19.17
339 09:51:07 650.36 18.64
340 09:51:09 652.28 18.32
341 09:51:11 654.20 18.15
342 09:51:13 656.12 17.58
343 09:51:15 658.04 17.49
344 09:51:17 659.96 17.34
345 09:51:19 661.88 16.84
346 09:51:21 663.79 16.39
347 09:51:23 665.71 16.41
348 09:51:25 667.63 15.90
349 09:51:27 669.55 15.63
for i, file in tqdm(enumerate(p_data), desc='Reading files'):
df = pd.read_csv(file)
df['Time'] = pd.to_datetime(df['Time'])
s_indices = df.index[(df['Pressure [KPa]'].diff() > 2.2) & (df['Pressure [KPa]'].diff() <=5)].tolist()
filtered_indices = [idx for idx in s_indices if 2 < df.loc[idx, 'Pressure [KPa]'] < 50]
this gives a list of indices which is pretty accurate for the pressure build up. the thing is i get more that one index per pump instance:
filtered_indices = [110, 111, 112, 113, 114, 1004, 1005, 1006, 1007, 1931, 3046, 6543, 6544, 6545, 7258, 7259, 8263, 8264].
how can i get only the first index for each "index zone"?
Also, since the pressure drop is not so straight forward as the pressure build up - it takes the pressure more time to drop than build up, how can i find the areas in the pressure data change the slope? i can add a plot of the data if needed (the data itself is very large)