Join two dataframes based on non-matching date and time using pandas and nearest-method

Question

I have edited the question based on @user17242583 comments and added more data and the output I get using just the example data.

I have two csv-files that I have read into two DataFrames, df1 and df2.

df1 (11 rows):

    ID          DateTime  Event_no
0    1   01/01/2019 0:02      1003
1    2   01/01/2019 0:28      1015
2    3   01/01/2019 1:43      1029
3    4   01/01/2019 2:12      1042
4    5   01/01/2019 2:58      1055
5    6  01/01/2019 11:02      1068
6    7  01/01/2019 12:02      1081
7    8  01/01/2019 13:46      1094
8    9  01/01/2019 21:02      1107
9   10  01/01/2019 21:32      1120
10  11  01/01/2019 23:37      1133

df2 (14 rows):

    ID      lat        lon          DateTime
0   15  50.7823  90.564000   01/01/2019 0:03
1   16  51.4852  90.473600   01/01/2019 0:29
2   17  50.2981  90.387600   01/01/2019 3:53
3   18  50.3710  90.298667   01/01/2019 4:03
4   19  50.1289  90.210467   01/01/2019 5:03
5   20  49.8868  90.122267  01/01/2019 11:03
6   21  49.6447  90.034067  01/01/2019 13:03
7   22  49.4026  89.945867  01/01/2019 15:03
8   23  49.1605  89.857667  01/01/2019 19:03
9   24  48.9184  89.769467  01/01/2019 21:03
10  25  48.6763  89.681267  01/01/2019 22:03
11  26  48.4342  89.593067  01/01/2019 22:23
12  27  48.1921  89.504867  01/01/2019 23:03
13  28  47.9500  89.416667  01/01/2019 23:43

I need to join these two DataFrames based on the nearest date and time so that the joined Dataframe looks like this and meets these conditions:

df_join (11 rows)

all the events need to be joined with one location
one location can be joined with multiple events
some locations don't have an event to join to:

    ID      lat        lon          DateTime Event_no
0   15  50.7823  90.564000   01/01/2019 0:03 1003
1   16  51.4852  90.473600   01/01/2019 0:29 1015
2   16  51.4852  90.473600   01/01/2019 0:29 1029
3   17  50.2981  90.387600   01/01/2019 3:53 1042
4   17  50.2981  90.387600   01/01/2019 3:53 1055 
5   20  49.8868  90.122267  01/01/2019 11:03 1068
6   20  49.8868  90.122267  01/01/2019 11:03 1081
7   21  49.6447  90.034067  01/01/2019 13:03 1094
8   24  48.9184  89.769467  01/01/2019 21:03 1107
9   25  48.6763  89.681267  01/01/2019 22:03 1120
10  28  47.9500  89.416667  01/01/2019 23:43 1133

Following @jezrael's answer here I've written the following code:

import pandas as pd

df1 = pd.read_csv("path/filename1.csv")
df2 = pd.read_csv("path/filename2.csv")

df1['DateTime'] = pd.to_datetime(df1.DateTime)
df2['DateTime'] = pd.to_datetime(df2.DateTime)

df1.sort_values('DateTime', inplace=True)
df2.sort_values('DateTime', inplace=True)

df1_join = df1.set_index('DateTime').reindex(df2.set_index('DateTime').index, method='nearest').reset_index()

df1_merge = (pd.merge(df2, df1_join, on='DateTime'))
df1_merge.to_csv("path/filename_join.csv"")

The code runs through just fine but doesn't give me the results I need.

print(df1_join)
              DateTime  ID  Event_no
0  2019-01-01 00:03:00   1      1003
1  2019-01-01 00:29:00   2      1015
2  2019-01-01 03:53:00   5      1055
3  2019-01-01 04:03:00   5      1055
4  2019-01-01 05:03:00   5      1055
5  2019-01-01 11:03:00   6      1068
6  2019-01-01 13:03:00   8      1094
7  2019-01-01 15:03:00   8      1094
8  2019-01-01 19:03:00   9      1107
9  2019-01-01 21:03:00   9      1107
10 2019-01-01 22:03:00  10      1120
11 2019-01-01 22:23:00  10      1120
12 2019-01-01 23:03:00  11      1133
13 2019-01-01 23:43:00  11      1133

It doesn’t join each event to one location (events 1029, 1042 and 1081 missing)
The code allows one event to be joined with multiple locations (1055, 1094, 1107, 1120 and 1133)

Any advice on how to edit the code so that the previously conditions are met?

Maybe you need to share more data? It's working fine for me: `df1_merge` holds **the very same dataframe** as the one in your expected output. — , Nov 11 '21 at 01:32

Join two dataframes based on non-matching date and time using pandas and nearest-method

0 Answers0