0

My issue is very simple, but I just can't wrap my head around it: I have two dataframes:

  1. time series dataframe with two columns: Timestamp and DataValue
  2. A time interval dataframe with start, end timestamps and a label

What I want to do:

Add a third column to the timeseries that yields the labels according to the time interval dataframe.

Every timepoint needs to have an assigned label designated by the time interval dataframe.

This code works:

TimeSeries_labelled = TimeSeries.copy(deep=True)
TimeSeries_labelled["State"] = 0
for index in Timeintervals_States.index:
    for entry in TimeSeries_labelled.index:
         if Timeintervals_States.loc[index,"start"] <= TimeSeries_labelled.loc[entry, "Timestamp"] <=     Timeintervals_States.loc[index,"end"]:
             TimeSeries_labelled.loc[entry, "State"] = Timeintervals_States.loc[index,"state"]

But it is really slow. I tried to make it shorter and faster with pyhton built in filter codes, but failed miserably. Please help!

ThunderHorn
  • 1,975
  • 1
  • 20
  • 42
fabioloso
  • 33
  • 3
  • Hi! For Pandas questions, best include some your actual data: https://stackoverflow.com/a/20159305/463796 – w-m Aug 08 '18 at 08:40

1 Answers1

0

I don't really know about TimeSeries, with a dataframe containing timestamps as datetime object you could use something like the following :

import pandas as pd
#Create the thrid column in the target dataframe
df_timeseries['label'] = pd.Series('',index=df_timeseries.index)
#Loop over the dataframe containing start and end timestamps
for index,row in df_start_end.iterrows():
    #Create a boolean mask to filter data
    mask = (df_timeseries['timestamp'] > row['start']) & (df_timeseries['timestamp'] < row['end']) 
    df_timeseries.loc[mask,'label'] = row['label']

This will make the rows your timeseries dataframe that match the condition of the mask have the label of the row, for each rows of your dataframe containing start & end timestamps

Bruce Swain
  • 583
  • 3
  • 10