0

I'm working on a timeseries dataframe which looks like this and has data from January to August 2020.

    Timestamp            Value
    2020-01-01 00:00:00 -68.95370
    2020-01-01 00:05:00 -67.90175
    2020-01-01 00:10:00 -67.45966
    2020-01-01 00:15:00 -67.07624
    2020-01-01 00:20:00 -67.30549
    .....
    2020-07-01 00:00:00 -65.34212

I'm trying to apply a filter on the previous dataframe using the columns start_time and end_time in the dataframe below:

  start_time            end_time
 2020-01-12 16:15:00 2020-01-13 16:00:00
 2020-01-26 16:00:00 2020-01-26 16:10:00
 2020-04-12 16:00:00 2020-04-13 16:00:00
 2020-04-20 16:00:00 2020-04-21 16:00:00
 2020-05-02 16:00:00 2020-05-03 16:00:00

The output should assign all values which are not within the start and end time as zero and retain values for the start and end times specified in the filter. I tried applying two simultaneous filters for start and end time but didn't work.

Any help would be appreciated.

Chinmay
  • 147
  • 2
  • 8

2 Answers2

2

Idea is create all masks by Series.between in list comprehension, then join with logical_or by np.logical_or.reduce and last pass to Series.where:

print (df1)
            Timestamp     Value
0 2020-01-13 00:00:00 -68.95370 <- changed data for match
1 2020-01-01 00:05:00 -67.90175
2 2020-01-01 00:10:00 -67.45966
3 2020-01-01 00:15:00 -67.07624
4 2020-01-01 00:20:00 -67.30549
5 2020-07-01 00:00:00 -65.34212

L = [df1['Timestamp'].between(s, e) for s, e in df2[['start_time','end_time']].values]
m = np.logical_or.reduce(L)

df1['Value'] = df1['Value'].where(m, 0)
print (df1)
            Timestamp    Value
0 2020-01-13 00:00:00 -68.9537
1 2020-01-01 00:05:00   0.0000
2 2020-01-01 00:10:00   0.0000
3 2020-01-01 00:15:00   0.0000
4 2020-01-01 00:20:00   0.0000
5 2020-07-01 00:00:00   0.0000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Solution using outer join of merge method and query:

print(df1)
            timestamp     Value <- changed Timestamp to timestamp to avoid name conflict in query
0 2020-01-13 00:00:00 -68.95370 <- also changed data for match
1 2020-01-01 00:05:00 -67.90175
2 2020-01-01 00:10:00 -67.45966
3 2020-01-01 00:15:00 -67.07624
4 2020-01-01 00:20:00 -67.30549
5 2020-07-01 00:00:00 -65.34212

df1.loc[df1.index.difference(df1.assign(key=0).merge(df2.assign(key=0), how = 'outer')\
           .query("timestamp >= start_time and timestamp < end_time").index),"Value"] = 0

result:

            timestamp    Value
0 2020-01-13 00:00:00 -68.9537
1 2020-01-01 00:05:00   0.0000
2 2020-01-01 00:10:00   0.0000
3 2020-01-01 00:15:00   0.0000
4 2020-01-01 00:20:00   0.0000
5 2020-07-01 00:00:00   0.0000

Key assign(key=0) is added to both dataframes to produce cartesian product.

ipj
  • 3,488
  • 1
  • 14
  • 18