Let's consider a dataframe with a column of timestamp and a second column of measured values.
import pandas as pd
data = {'Time': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
'Value': [13,54,68,94,31,68,46,46,31,53,54,85,75,42,69]}
df = pd.DataFrame (data, columns = ['Time','Value'])
We want to filter the dataframe to keep only the values at specific timings.
start = [2, 9, 14]
end = [5, 12, 15]
In this case, we have 3 timeframes we want to keep; from 2s to 5s, from 9s to 12s, and from 14s to 15s. I created a column that marks the boundaries of the timeframes we want to keep.
df.loc[df["Time"].isin(start), "Observation"] = 'Start'
df.loc[df["Time"].isin(end), "Observation"] = 'End'
For filtering the rows, I was thinking of filling the cells between Start and End, and remove the empty rows. And this is where I'm stuck.
I had a go with using:
df = df.fillna(method='ffill')
The issue with this approach is that I only need this fill to be applied to start (to populate the inside of the timeframe of observation) but I don't want to fill after "End".
My first idea was to create another set of timestamp that would take the timestamp of the end of a session and add 1 to it:
import pandas as pd
data = {'Time': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
'Value': [13,54,68,94,31,68,46,46,31,53,54,85,75,42,69]}
df = pd.DataFrame (data, columns = ['Time','Value'])
start = [2, 9, 14]
end = [5, 12, 15]
out = [x+1 for x in end]
df.loc[df["Time"].isin(start), "Observation"] = 'Start'
df.loc[df["Time"].isin(end), "Observation"] = 'End'
df.loc[df["Time"].isin(out), "Observation"] = 'Out'
df = df.fillna(method='ffill')
The issue with this approach is that, for the problem I need to solve, the timestamps are not seconds at regular intervals. It is milliseconds at random intervals, so using this +1 to create the "Out" tag is not a reliable method, and it feels I'm overcomplicating something that should be simple; just keeping the observations between the start timestamps and the end timestamps (both timestamps included).
Using a filter (filter/select rows of pandas dataframe by timestamp column) could be an option. However, depending on the session I'm looking at, there can be a random amount of timeframes of interest. I wanted to try and use a for loop scanning through the list of start timestamps and the list of end timestamps to dynamically create such filter, but I didn't manage to get this working.
If anyone knows of a function that does exactly what I need, or that has any tip, that would be great.
Thank you.