I have two dataframe , showing below :
A Dataframe(about 50k row):
ID | Timestamp |
---|---|
1 | 2021/4/28 01:00:00 |
2 | 2021/4/27 01:03:02 |
1 | 2021/4/28 02:05:01 |
... | ... |
And B Dataframe(about 16M row):
ID | Timestamp | Activities |
---|---|---|
1 | 2021/4/28 00:59:58 | 30 |
2 | 2021/4/27 01:02:58 | 27 |
1 | 2021/4/28 02:04:07 | 44 |
1 | 2021/4/28 02:04:08 | 45 |
... | ... | ... |
I need to find out the Activities value in B dataframe which has the same ID and has the closest Timestamp in A dataframe , and incert to A Dataframe's new column.
Now I'm trying solving this question by simple loop , it works , but it's too slow. Here is the pseudo code :
- Iterate each row from A dataframe and get ID and Timestamp
- filter B by ID and make sure B Timestamp in B is less than or equal to A timestamp, sort decendingly by B's timestamp and get the 1st Activities value
- insert to A's new column in current row
- drop B's row which I have just used
- go forward to next row
But as I mentioned , this is too slow, how can I speed up this procedure ?