I have a pandas dataframe df
which I queried using a for loop that captures the events by considering the dummy variable of the Status
column. The logic is that when Status
changes from 0
to 1
, it marks the timestamp at 1
as the start of the event, when Status
changes from 1
to 0
, it marks the timestamp at 0
as the end of the event.
df
looks like this:
ID Timestamp Value Status
103177 64 2010-09-21 23:13:21.090 21.5 1.0
252019 64 2010-09-22 00:44:14.890 21.5 1.0
271381 64 2010-09-22 00:44:15.890 21.5 0.0
268939 64 2010-09-22 00:44:17.890 23.0 0.0
259875 64 2010-09-22 00:44:18.440 23.0 1.0
18870 64 2010-09-22 00:44:19.890 24.5 1.0
205910 64 2010-09-22 00:44:23.440 24.5 0.0
103865 64 2010-09-22 01:04:33.440 23.5 0.0
152281 64 2010-09-22 01:27:01.790 22.5 0.0
138988 64 2010-09-22 02:18:52.850 21.5 0.0
...
reproducible example:
from pandas import *
from numpy import nan
df = pd.DataFrame({'ID': {103177: 64,
252019: 64,
271381: 64,
268939: 64,
259875: 64,
18870: 64,
205910: 64,
103865: 64,
152281: 64,
138988: 64},
'Timestamp': {103177: Timestamp('2010-09-21 23:13:21.090000'),
252019: Timestamp('2010-09-22 00:44:14.890000'),
271381: Timestamp('2010-09-22 00:44:15.890000'),
268939: Timestamp('2010-09-22 00:44:17.890000'),
259875: Timestamp('2010-09-22 00:44:18.440000'),
18870: Timestamp('2010-09-22 00:44:19.890000'),
205910: Timestamp('2010-09-22 00:44:23.440000'),
103865: Timestamp('2010-09-22 01:04:33.440000'),
152281: Timestamp('2010-09-22 01:27:01.790000'),
138988: Timestamp('2010-09-22 02:18:52.850000')},
'Value': {103177: 21.5,
252019: 21.5,
271381: 21.5,
268939: 23.0,
259875: 23.0,
18870: 24.5,
205910: 24.5,
103865: 23.5,
152281: 22.5,
138988: 21.5},
'Status': {103177: 1.0,
252019: 1.0,
271381: 0.0,
268939: 0.0,
259875: 1.0,
18870: 1.0,
205910: 0.0,
103865: 0.0,
152281: 0.0,
138988: 0.0}})
df
query for-loop:
def my_func1(df):
dataframe = []
Start_time = []
StartValue = []
End_time = []
End_Value = []
StartStatus = []
End_Status = []
ID = []
state = 0
for i in range(1, len(df.index)):
if state == 0:
if (df.loc[i, 'Status'] > 0):
Start_time = df.loc[i, 'Timestamp']
StartValue = df.loc[i, 'Value']
StartStatus = df.loc[i, 'Status']
ID = df.loc[i, 'ID']
state = 1
else:
continue
elif state == 1:
if (df.loc[i, 'Status'] == 0):
End_time = df.loc[i, 'Timestamp']
End_Value = df.loc[i, 'Value']
End_Status = df.loc[i, 'Status']
state = 0
else:
continue
dataframe.append([ ID, Start_time, StartValue, StartStatus, End_time, End_Value, End_Status])
else:
continue
# Dataframe
output_table = pd.DataFrame(dataframe, columns= ["ID", "StartTime", "StartValue", "StartStatus", "EndTime", "EndValue", "EndStatus"])
return output_table
and output table format looks like this(just an example, not same data as above):
ID StartTime StartValue StartStatus EndTime EndValue EndStatus
0 64 2010-09-22 00:44:14.890 21.5 1.0 2010-09-22 00:44:15.890 21.5 0.0
1 64 2010-09-22 00:44:18.440 23.0 1.0 2010-09-22 00:44:23.440 24.5 0.0
2 64 2010-09-23 00:06:48.970 100.0 1.0 2010-09-23 00:06:52.970 100.0 0.0
3 64 2010-09-23 00:07:14.970 100.0 1.0 2010-09-23 00:07:23.970 100.0 0.0
4 64 2010-09-23 02:07:52.990 100.0 1.0 2010-09-23 02:08:13.990 100.0 0.0
Now I want to look into each of the events inside the original dataframe df
using each row in output_table
, where each event can be found by filtering Timestamp
in between StartTime
and EndTime
. How can I apply another of my own defined function my_func2
to the data of each these individual event in df
? (my_func2
is supposed to create a label in output_table
that represents if each event satisfies some condition by considering the Value
column.)
Edit:
To be more specific, my_func2
measures continuous time interval where Value
stays above 30
for at least 10 seconds. Then I want to find the overlap of my_func1
and my_func2
, and eventually calculate the median
value in this time interval for each event.
I guess I could work out the overlap by comparing the timestamps of the start and end points of the 2 events(calculated by the two functions). What I need help with is how to find the original data in df
for calculating the median value for each of the individual event? Any idea is appreciated.