Filling Pandas column between timestamp boundaries

Question

Let's consider a dataframe with a column of timestamp and a second column of measured values.

    import pandas as pd

    data = {'Time':  [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
            'Value': [13,54,68,94,31,68,46,46,31,53,54,85,75,42,69]}

    df = pd.DataFrame (data, columns = ['Time','Value'])

We want to filter the dataframe to keep only the values at specific timings.

    start = [2, 9, 14]
    end = [5, 12, 15]

In this case, we have 3 timeframes we want to keep; from 2s to 5s, from 9s to 12s, and from 14s to 15s. I created a column that marks the boundaries of the timeframes we want to keep.

    df.loc[df["Time"].isin(start), "Observation"] = 'Start'
    df.loc[df["Time"].isin(end), "Observation"] = 'End'

For filtering the rows, I was thinking of filling the cells between Start and End, and remove the empty rows. And this is where I'm stuck.

I had a go with using:

    df = df.fillna(method='ffill')

The issue with this approach is that I only need this fill to be applied to start (to populate the inside of the timeframe of observation) but I don't want to fill after "End".

My first idea was to create another set of timestamp that would take the timestamp of the end of a session and add 1 to it:

    import pandas as pd

    data = {'Time':  [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
            'Value': [13,54,68,94,31,68,46,46,31,53,54,85,75,42,69]}

    df = pd.DataFrame (data, columns = ['Time','Value'])

    start = [2, 9, 14]
    end = [5, 12, 15]
    out = [x+1 for x in end]

    df.loc[df["Time"].isin(start), "Observation"] = 'Start'
    df.loc[df["Time"].isin(end), "Observation"] = 'End'
    df.loc[df["Time"].isin(out), "Observation"] = 'Out'

    df = df.fillna(method='ffill')

The issue with this approach is that, for the problem I need to solve, the timestamps are not seconds at regular intervals. It is milliseconds at random intervals, so using this +1 to create the "Out" tag is not a reliable method, and it feels I'm overcomplicating something that should be simple; just keeping the observations between the start timestamps and the end timestamps (both timestamps included).

Using a filter (filter/select rows of pandas dataframe by timestamp column) could be an option. However, depending on the session I'm looking at, there can be a random amount of timeframes of interest. I wanted to try and use a for loop scanning through the list of start timestamps and the list of end timestamps to dynamically create such filter, but I didn't manage to get this working.

If anyone knows of a function that does exactly what I need, or that has any tip, that would be great.

Thank you.

score 0 · Accepted Answer · answered Oct 31 '20 at 23:55

How about creating function that zips your start and end lists and checks whether the element is within given pair of values:

def catch_df(start, end, element):
    start_end = zip(start, end)
    for i, z in enumerate(start_end):
        if element >= z[0] and element <= z[1]:
            return "df{}".format(i)

and apply that function to values stored in dataframe df:

df['Result'] = df['Time'].apply(lambda x: catch_df(start, end, x))

so as result you receive following dataframe, which could be easily filtered for None values etc:

    Time Value  Observation Result
0   1    13     NaN         None
1   2    54     Start       df0
2   3    68     NaN         df0
3   4    94     NaN         df0
4   5    31     End         df0
5   6    68     NaN         None
6   7    46     NaN         None
7   8    46     NaN         None
8   9    31     Start       df1
9   10   53     NaN         df1
10  11   54     NaN         df1
11  12   85     End         df1
12  13   75     NaN         None
13  14   42     Start       df2
14  15   69     End         df2

Filling Pandas column between timestamp boundaries

1 Answers1