1

I have two DataFrames: 1st one contains around 10k lines with coordinates 2nd contains around 2 million lines with coordinates and associated with them information.

I need to compare these 10k of coordinates in df1 to the big 2 million list of df2, find closest coordinate match and copy the information from the matching row in df2 to df1.

All solutions I could think of so far would requite me to use iteration which would take a very long time. Is there any more time efficient way to do this?

Here is an example of the inputs:

>>> df1
   name     lat        long
1    a    51.5068   -0.0733794  
>>> df2
      lat          long        value
1  51.078541    -0.066799      1000
2  55.056743    -2.127532       50

After the execution I need to get an output like that

>>> df1
   name     lat        long        value
1    a    51.5068   -0.0733794      1000
Ana
  • 11
  • 2
  • What do you mean by closest coordinate? Are you using near match? This could be a banger to the optimization. – DiMithras Nov 29 '22 at 15:34
  • Yes, I would need the nearest match, so they are not necessarily the same and its calculated both by latitude and longitude. Basically matching location from df1 to the nearest geographical area defined in df2. – Ana Nov 29 '22 at 15:39
  • If I understood your questions correct, you might want to crop `float`s to some appropriate precision and make something like `pandas.DataFrame.join` or `pandas.DataFrame.merge` with the resulting `DataFrame`s – alphamu Nov 29 '22 at 15:45
  • here is my answer to a completely similar question https://stackoverflow.com/a/74598075/16591526 – padu Nov 29 '22 at 15:58
  • thank you, didn't see this before! – Ana Nov 29 '22 at 16:10
  • I disagree with closing the question, as the answer https://stackoverflow.com/questions/38965720/find-closest-point-in-pandas-dataframes given is not optimal for a large dataset, as in the use case. – padu Nov 29 '22 at 16:21

0 Answers0