Efficient calculation of distance based on information in other dataframe (possible without loop?)

Question

I have 2 quite large dataframes (>>100,000,000 of rows). I would like to calculate a column sum_distance in first dataframe based on data in second dataframe

first dataframe df1 (simplified and should be readable with pd.read_clipboard(sep='\s\s+')):

        vehicle  month_start  month_end
0  577-1    201201    201203
1  577-1    201203    201207
2  577-1    201203    201207
3  577-1    201203    201207
4  577-1    201204    201207
5  577-1    201205    201207
6  577-1    201206    201207
7  577-1    201206    201209
8  577-1    201207    201209
9  577-1    201207    201210

second dataframe df2:

    vehicle  Year  Month Distance
0  577-1  2012  1   13680.0
1  577-1  2012  2   13295.0
2  577-1  2012  3   13010.0
3  577-1  2012  4   16267.0
4  577-1  2012  5   11825.0
5  577-1  2012  6    9739.0
6  577-1  2012  7   14290.0
7  577-1  2012  8   14117.0
8  577-1  2012  9   10687.0
9  577-1  2012  10   14504.0

I am looking for efficient way how to calculate column in df1 "sum_distance" (sum of distance for rows in df2 where year/month is between month_start and month_end), if possible without looping in df1. What would be the most efficient way?

expected output df1

        vehicle  month_start  month_end  sum_distance
0  577-1    201201    201203    26975.0
1  577-1    201203    201207    50841.0
2  577-1    201203    201207    50841.0
3  577-1    201203    201207    50841.0
4  577-1    201204    201207    37831.0
5  577-1    201205    201207    21564.0
6  577-1    201206    201207    9739.0
7  577-1    201206    201209    38146.0
8  577-1    201207    201209    28407.0
9  577-1    201207    201210    39094.0

Please take a while to read [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Corralien, Jul 16 '21 at 22:01
`pd.read_image` does not exist for us to be able to run this program we'd have to manually transcribe the data from this image. Not only is this time consuming its also prone to simple mistakes. Please review @Corralien 's link about the format of information that is expected. — Henry Ecker, Jul 16 '21 at 22:04

Efficient calculation of distance based on information in other dataframe (possible without loop?)

0 Answers0