I have a stream of data in a csv file that signifies the time with date in the 1st column and value in the 2nd column. The data is plotted below. I need to write an algorithm that gives me an array with time according to how long the peak lasted
here is the graph
here is some of the data from the csv file
Column1,Column2
2023-03-14 14:00:59.0,195.80
2023-03-14 14:02:06.0,174.20
2023-03-14 14:03:14.0,156.76
2023-03-14 14:04:21.0,142.36
2023-03-14 14:05:29.0,131.00
2023-03-14 14:06:37.0,122.00
2023-03-14 14:07:44.0,114.91
2023-03-14 14:08:52.0,109.18
2023-03-14 14:10:00.0,104.56
2023-03-14 14:11:07.0,100.74
2023-03-14 14:12:15.0,97.93
2023-03-14 14:13:22.0,95.45
2023-03-14 14:14:30.0,93.43
2023-03-14 14:15:37.0,91.85
2023-03-14 14:16:45.0,90.73
2023-03-14 14:17:53.0,89.49
2023-03-14 14:19:00.0,88.59
2023-03-14 14:20:08.0,87.91
2023-03-14 14:21:15.0,87.13
2023-03-14 14:22:23.0,86.68
2023-03-14 14:23:30.0,86.23
2023-03-14 14:24:38.0,86.23
2023-03-14 14:25:45.0,108.61
2023-03-14 14:26:53.0,142.70
2023-03-14 14:28:01.0,175.89
2023-03-14 14:29:08.0,203.79
2023-03-14 14:30:16.0,225.84
2023-03-14 14:31:23.0,241.25
2023-03-14 14:32:31.0,253.29
2023-03-14 14:33:39.0,262.18
2023-03-14 14:34:46.0,262.29
2023-03-14 14:35:54.0,262.29
2023-03-14 14:37:01.0,262.29
2023-03-14 14:38:09.0,260.83
2023-03-14 14:39:16.0,235.51
2023-03-14 14:40:24.0,208.85
2023-03-14 14:41:31.0,185.45
2023-03-14 14:42:39.0,166.33
this is my code for peak detection which is not working properly
import pandas as pd
import numpy as np
from datetime import datetime
# Read the data from the CSV file
df = pd.read_csv('test.csv')
# Convert the first column to datetime format
df['Column1'] = pd.to_datetime(df['Column1'])
# Convert the second column to numeric type
df['Column2'] = pd.to_numeric(df['Column2'])
# Find the peaks using numpy
diff1 = np.diff(df['Column2'])
diff2 = np.diff(np.sign(diff1))
peaks, = np.where(diff2 < 0)
peak_durations = np.zeros(len(peaks), dtype=float)
start_times = np.zeros(len(peaks), dtype='datetime64[m]')
for i, peak_index in enumerate(peaks):
start_index = np.argmax(df['Column2'][:peak_index]) # Index of start of peak
end_index = np.argmin(df['Column2'][peak_index:]) + peak_index # Index of end of peak
duration_minutes = (df['Column1'][end_index] - df['Column1'][start_index]).total_seconds() / 60
peak_durations[i] = duration_minutes
start_times[i] = df['Column1'][start_index]
# Convert start times to desired string format
start_times_str = [np.datetime_as_string(dt, unit='ms') for dt in start_times]
# Combine start times and durations into a 2-dimensional array
peaks_info = np.vstack((start_times_str, peak_durations)).T
print(peaks_info)
the result i am getting
[['2023-03-14T14:32:00.000' '189.15']
['2023-03-14T14:34:00.000' '186.9']
['2023-03-14T14:34:00.000' '186.9']
['2023-03-14T14:34:00.000' '186.9']
['2023-03-14T14:34:00.000' '186.9']
['2023-03-14T14:34:00.000' '186.9']
['2023-03-14T14:34:00.000' '186.9']
['2023-03-14T14:34:00.000' '186.9']
['2023-03-14T14:34:00.000' '186.9']
['2023-03-14T14:34:00.000' '186.9']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '552.3166666666667']
['2023-03-14T14:34:00.000' '561.3333333333334']]
the result i expect is
this is the responce i am getting
[datetime.datetime(2023, 3, 14, 14, 34) 186.9]
[datetime.datetime(2023, 3, 14, 14, 34) 186.9]
i want it to be [(2023 03 14 14 31 00.00) 28 mins]
im this 1st part is date and time for start of the peak and 2nd value is duration of the peak
note: i cant add the csv file here