0

For this project, I have two dataframes one called df1 and another called df2. These dataframes are not the same size (don't think that matters).

Each of them have a datetime in the first column. What I am trying to do is:

I want to make a new dataframe df3 that contains the remaining data from the df2 but only for the when the original times in the first column are closest to the df1.

Here is an example of what the dataframes might look like:

 print (df1)
 Output:
                 Date       Val
 0 2015-02-24 00:00:02  1.764052
 1 2015-02-24 00:01:15  0.400157
 2 2015-02-24 00:02:22  0.978738
 3 2015-02-24 00:03:39  2.240893
 4 2015-02-24 00:04:00  1.867558

 print (df2)
 Output:
                 Date       Val      Name  
 0 2015-02-24 00:00:00  -0.977278    John
 1 2015-02-24 00:01:00   0.950088    Robert
 2 2015-02-24 00:02:00  -0.103219    Sam
 3 2015-02-24 00:03:00   0.151357    Tim
 4 2015-02-24 00:04:00   0.410599    Hector 
 5 2015-02-24 00:05:00   0.673247    Melissa 

Then what I want to be able to retrieve is something like this:

print (df2)
 Output:
                 Date       Val      Name  
 0 2015-02-24 00:00:00  -0.977278    John
 1 2015-02-24 00:01:00   0.950088    Robert
 2 2015-02-24 00:02:00  -0.103219    Sam
 3 2015-02-24 00:04:00   0.410599    Hector
 4 2015-02-24 00:04:00   0.410599    Hector 
 5 2015-02-24 00:05:00   0.673247    Melissa 

I have searched around a bit and found that there were two similar posts on here example-1 example-2, but the difference being they just want a single value returned or a single row. For my purposes I want it to be 'Filtered' so to speak.

If anyone can provide any insight, that would be greatly appreciated, thank you.

Rob Burry
  • 1
  • 1

1 Answers1

0

If I understand correctly, I believe this gives you what you're looking for.

df1['df2_idx'] = df1.Date.apply(lambda x: [(abs(df2['Date'] - x)).idxmin()][0])
df3 = df2.reindex(df1['df2_idx'], axis=0).reset_index().drop(['df2_idx'], axis=1)

The first line just finds the row in df2 that is closest to each row in df1 and appends the indices as a column to df1 like this:

                 Date       Val  df2_idx
0 2015-02-24 00:00:02  1.764052        0
1 2015-02-24 00:01:15  0.400157        1
2 2015-02-24 00:02:22  0.978738        2
3 2015-02-24 00:03:39  2.240893        4
4 2015-02-24 00:04:00  1.867558        4

The second row then just subsets df2 with the column of indices. The final output is

                 Date       Val    Name
0 2015-02-24 00:00:00 -0.977278    John
1 2015-02-24 00:01:00  0.950088  Robert
2 2015-02-24 00:02:00 -0.103219     Sam
3 2015-02-24 00:04:00  0.410599  Hector
4 2015-02-24 00:04:00  0.410599  Hector
Frodnar
  • 2,129
  • 2
  • 6
  • 20