0

I'm new to python, numpy and Jupyter notebooks.

I have two CSV files containing a route in X,Y coordinates.

For each (X,Y) pair in dataset 1, I want to find the closest point and distance in the second dataset of (X,Y) coordinates. Any assistance would be greatly appreciated.

I can load in the dataset and access the data to plot the two routes, but I'm having difficulty doing this next step.

  • 1
    I'll be honest and say that it does appear to do what I want to do, but I'm not sure how I access the columns of the dataframes in the example. – Andrew Lindsay Sep 13 '22 at 13:00
  • Try adding a little more detail to your question. Explain how you loaded the data (it sounds like you may have used Pandas `read_csv`, but I can't be sure). Tell us as specifically as you can which bit of the linked answer you're having trouble implementing in your instance. – s_pike Sep 13 '22 at 13:08
  • 1
    How would I create the combined_x_y_arrays and points_list in the example you've provided from my two pandas dataframes, (Route1 the source coordinates, and Route2, the dataframe I want to find the closest point in)?
        Route1 = pd.read_csv('C:/Route1.csv', header=None)
        Route2 = pd.read_csv('c:/Route2.csv', header=None)
    
    – Andrew Lindsay Sep 13 '22 at 13:10
  • Sorry, but I can't seem to get the code to format neatly. – Andrew Lindsay Sep 13 '22 at 13:16
  • ` import pandas as pd import numpy as np from scipy.spatial import KDTree points_1 = pd.DataFrame({'x1': np.arange(5), 'y1': np.arange(5)}) points_2 = pd.DataFrame({'x2': np.random.rand(5)*5, 'y2': np.random.rand(5)*5}) my_tree = KDTree(points_2) dist, nearest_idx = my_tree.query(points_1) nearest_point_2 = points_2.iloc[nearest_idx,:].reset_index(drop=True) pd.concat([points_1, nearest_point_2], axis=1) ` - sorry, line breaks are wrong, but this is approximately what you want. – s_pike Sep 13 '22 at 13:19
  • OK - so I have made some headway I think... I have managed to get the dataframes to load, though for some reason I had to keep the 'points' as columns, where the example provided had the points as two rows. Not sure what's going on there. But now I have the index to the nearest point in the opposite list. More digging as I have to now find out how to calulate the distance. – Andrew Lindsay Sep 13 '22 at 13:32
  • The distance is returned when you run `my_tree.query(point_list)`. You get a tuple returned from that, the first element being the distance. So if you do `dist, idx = my_tree.query(point_list)` you get the two things you want. – s_pike Sep 13 '22 at 15:00

0 Answers0