This question is a "sequel" to the one I asked here. I have 2 DataFrames that look like this:
df1 =
Timestamp CP_ID Con Value
0 2018-01-02 07:30:00.000 3107 1 6275.20
1 2018-01-02 07:45:00.000 3107 1 6275.95
2 2018-01-02 08:00:00.000 3107 1 6276.70
3 2018-01-02 08:15:00.000 3107 1 6277.45
4 2018-01-02 08:30:00.000 3107 1 6278.19
df2 =
dStart dEnd CP_ID Con A_Id
0 2018-03-12 18:58:53 2018-03-12 21:19:33 30554 1 03FC4BA8
1 2018-04-16 17:53:25 2018-04-16 20:22:52 30554 1 03FC4BA8
2 2018-05-10 19:19:47 2018-05-10 21:28:04 30554 1 03FC4BA8
3 2018-05-26 18:03:19 2018-05-26 20:49:23 30553 1 03FC4BA8
4 2018-08-10 18:26:29 2018-08-10 20:57:15 30827 1 03FC4BA8
df2 is a list of all sessions from all users (dStart/dEnd of each session).
df1 is user specific session info (Timestamp throughout each session).
CP_ID and Con specify a unique location.
A_Id specifies a unique user.
I want to merge these two tables so that I'm left with all of the info for a given user's sessions. My current solution, as proposed by Wen-Ben in my previous post, is something like:
df = pd.merge(df2,df1,how='left',left_on=['CP_ID','Con'], \
right_on=['CP_ID','Con'])
df = df.loc[(df.Timestamp>df.dStart)&(df.Timestamp<df.dEnd)]
This accomplishes the task, but my issue is that it takes a relatively long time. I need to do this operation for thousands of users, and both steps (.merge and .loc) take a very long proportion of the runtime of my script.
Is there a more efficient way of doing this?