Finding shortest distance of every point to a non-straight line in Python

Question

I have created figures similar to this one here:

file in question

My goal here is to take each blue point and calculate the shortest distance it would take to get to any point on the red line. Ideally, this could be used to select the x% closest points or those falling within a certain distance, but the primary issue here is calculating each distance in the first place.

The points were taken from a data file and plotted as such:

data = np.loadtxt('gr.dat') ... ax.scatter(data[:,0],data[:,1])

whereas the red line is a calculated Baraffe track where all points used to create the line were stored in a dat file and plotted via:

df=pd.read_csv('baraffe.dat', sep="\s+", names= ['mass', 'age', 'g', 'r', 'i'])
df2 = pd.DataFrame(df, columns=["mass", "age", "g", "r", "i"])
df2['b_color'] = df2['g'] - df2['r']
df2.plot(ax=ax, x='b_color',y='g', color="r")
...`

This is my first attempt at using pandas so I know my code could definitely be optimized and is likely redundant, but it does output the figure attached.

Essentially, I want to calculate the smallest distance each dot would have to move (in both x and y) to reach any point on the red line. I did try and mimic the answer in (here) but I am unsure how to apply that definition to a dataframe or larger array without always getting a TypeError. If there is any insight to this I would greatly appreciate it, and thank you!

For clarification: Would it be acceptable to approximate your line by mathematical straight line - or do you have many saved points and want to use only them? — Daraan, Feb 06 '23 at 20:49
With more data, it's possible to give an answer. However a fast solution is to use `BallTree` from `sklearn` to compute the distance between each red points and blue points. — Corralien, Feb 06 '23 at 21:24
Talking of a distance, when you have different physical quantities on the two axes, is a bit of a stretch, do you just want `d = sqrt(ΔM²+ΔC²)` or do you want to apply some sort of scaling to one of the variables? — gboffi, Feb 07 '23 at 00:05

gboffi · Answer 1 · 2023-02-07T00:11:03.243

Use scipy.spatial.KDTree.

Once you have built the KDTree on the points of the Baraffe track, you can use the different methods of the KDTree instance to compute all the quantities that are interesting you.

Here, for simplicity, I have just shown how to use the query method to build a 1—1 correspondence between most-neighboring points.

import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import KDTree
np.random.seed(20230307)

x = np.linspace(0, 10, 51)
y = np.sin(x)*0.7
x, y = +x*0.6+y*0.8, -0.8*x+0.6*y

xp = np.linspace(1, 9, 21)
yp = -1+np.random.rand(21)*0.4
xp, yp = +xp*0.6+yp*0.8, -0.8*xp+0.6*yp

kdt = KDTree(np.vstack((x, y)).T) # the array that is indexed must be N×2
distances, indices = kdt.query(np.vstack((xp, yp)).T, k=1)

fig, ax = plt.subplots()
ax.set_aspect(1)

ax.plot(x, y, color='k', lw=0.8)
ax.scatter(xp, yp, color='r')
for x0, y0, i in zip(xp, yp, indices):
    plt.plot((x0, x[i]), (y0, y[i]), color='g', lw=0.5)
plt.show()

Finding shortest distance of every point to a non-straight line in Python

1 Answers1