2

I have two dataframes with different labels, df1 and df2.

df1 contains (amongst other things) a list of time intervals (start/stop). df2 contains a list of events with timestamps.

I want to check which time intervals in df1 include an event from df2. It doesn't matter which specific event, and it doesn't matter how many events. Yes/No is enough.

What I have (simplified):

df1

 Index  Start_time  Stop_time (other columns...)
 1      1           5
 2      8           10
 3      20          22
 4      23          40

df2

Index  Event_time (other columns...)
1      2
2      400
3      21
4      40

What I want:

df3

 Index  Start_time  Stop_time Event Event_time(optional) (other columns...)
 1      1           5         Yes   2
 2      8           10        No    NaN
 3      20          22        Yes   21
 4      23          40        Yes   40

Note that (other columns) are different in both dataframes. Therefore, a direct comparison yields the Can only compare identically-labeled DataFrame objects-error.

How to compare values in non-identically labelled pandas dataframe objects?

EDIT: This and this looks like it is applicable here, but no results so far

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
sudonym
  • 3,788
  • 4
  • 36
  • 61

1 Answers1

2

Consider using series between:

df = df[df['event_time'].between(<Start_time>, <Stop_time>, inclusive=True)]

EDIT:

In [151]
df1  = pd.DataFrame({'Start_time':[1,8,20,23], 'Stop_time':[5,10,22,40]})

In [152]
df2 = pd.DataFrame({'Event_time':[2, 400, 21, 40]})

In [153]
df2['Event'] = df2['Event_time'].between(df1['Start_time'], df1['Stop_time'], inclusive=True)

In [154]
df2
Out [154]:
   Event_time  Event
0           2   True
1         400  False
2          21   True
3          40   True
SerialDev
  • 2,777
  • 20
  • 34
  • the challenge is that df2 includes thousands of events, and the code runs once every 5 minutes. Therefore, the and needs to be filled automatically. – sudonym Feb 01 '17 at 12:01
  • Made an edit, consider accepting if it fixed your problem – SerialDev Feb 01 '17 at 12:38
  • I followed your instructions exactly and am still getting ValueError: Can only compare identically-labeled Series objects (not Dataframe objects, which is new) – sudonym Feb 01 '17 at 12:55
  • Try sorting the index first and let me know if it works – SerialDev Feb 01 '17 at 13:14
  • did that acording to the link below: doesn't work https://stackoverflow.com/questions/18548370/pandas-can-only-compare-identically-labeled-dataframe-objects-error – sudonym Feb 01 '17 at 13:21
  • If you could add a representation of your data we can use as a proof of concept edit your question and I will see what I can do – SerialDev Feb 01 '17 at 13:24