0

i have a large data set that contains time and pressure data. the pressure is affected by the operation of a pump. i want to get the areas in the data where the pump was operating. i tried filtering based on pressure value since the pressure rises very quickly when the pump starts using the code:

Time   Count  Pressure [KPa]
300  09:49:52  575.54           36.05
301  09:49:54  577.46           36.07
302  09:49:56  579.38           36.11
303  09:49:58  581.30           36.16
304  09:50:00  583.22           36.03
305  09:50:02  585.14           36.09
306  09:50:04  587.05           36.05
307  09:50:06  588.97           36.07
308  09:50:08  590.89           36.16
309  09:50:10  592.81           36.16
310  09:50:12  594.73           36.11
311  09:50:14  596.65           36.22
312  09:50:15  598.57           36.15
313  09:50:17  600.48           36.15
314  09:50:19  602.40           36.16
315  09:50:21  604.32           36.18
316  09:50:23  606.24           36.18
317  09:50:25  608.16           36.18
318  09:50:27  610.08           36.24
319  09:50:29  612.00           35.26
320  09:50:31  613.91           33.65
321  09:50:33  615.83           32.23
322  09:50:35  617.75           30.76
323  09:50:37  619.67           29.55
324  09:50:38  621.59           28.17
325  09:50:40  623.51           26.96
326  09:50:42  625.42           26.05
327  09:50:44  627.34           25.12
328  09:50:46  629.26           24.35
329  09:50:48  631.18           23.78
330  09:50:50  633.10           23.08
331  09:50:52  635.02           22.08
332  09:50:54  636.94           21.85
333  09:50:56  638.85           21.19
334  09:50:58  640.77           20.85
335  09:51:00  642.69           20.15
336  09:51:02  644.61           20.10
337  09:51:03  646.53           19.36
338  09:51:05  648.45           19.17
339  09:51:07  650.36           18.64
340  09:51:09  652.28           18.32
341  09:51:11  654.20           18.15
342  09:51:13  656.12           17.58
343  09:51:15  658.04           17.49
344  09:51:17  659.96           17.34
345  09:51:19  661.88           16.84
346  09:51:21  663.79           16.39
347  09:51:23  665.71           16.41
348  09:51:25  667.63           15.90
349  09:51:27  669.55           15.63

for i, file in tqdm(enumerate(p_data), desc='Reading files'):
    df = pd.read_csv(file)
    df['Time'] = pd.to_datetime(df['Time'])
    s_indices = df.index[(df['Pressure [KPa]'].diff() > 2.2) & (df['Pressure [KPa]'].diff() <=5)].tolist()
    filtered_indices = [idx for idx in s_indices if 2 < df.loc[idx, 'Pressure [KPa]'] < 50]

this gives a list of indices which is pretty accurate for the pressure build up. the thing is i get more that one index per pump instance:

filtered_indices  = [110, 111, 112, 113, 114, 1004, 1005, 1006, 1007, 1931, 3046, 6543, 6544, 6545, 7258, 7259, 8263, 8264]. 

how can i get only the first index for each "index zone"?

Also, since the pressure drop is not so straight forward as the pressure build up - it takes the pressure more time to drop than build up, how can i find the areas in the pressure data change the slope? i can add a plot of the data if needed (the data itself is very large)

  • 1
    please provide a [minimal reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and the matching expected output – mozway Jul 25 '23 at 07:29

0 Answers0