1

I have a program in which I need to find the distance of points from a curve. The curve is a complicated shape and I have 10^6 points in the full version. I have a successful program using shapely to select the points but it is too slow to run on the full version. I have made a toy version here, but the general idea is the same.

Is it possible to remove the loop?! How can I make it faster?

import numpy as n
import shapely.geometry as geom
import matplotlib.pyplot as plt
import  random 
import pandas as pd

#make some random coordinates 
coords=n.random.rand(100,2)*10.0
data = {'x':coords[:,0],
        'y':coords[:,1]}
data = pd.DataFrame(data)

#make a random curve
x=[0,1,2,3,4,7,10]
y=[0,0.5,5,8,9,5,10]
line_in=zip(x,y)

line=geom.LineString(line_in)

#find the distance from the curve for each point and add to the dataframe
data["dists"]=n.nan
for i, j in data.iterrows(): 
    point = geom.Point(data.x[i],data.y[i])
    data.dists[i]=line.distance(geom.Point(point))
    
#visualize the output coloured according to distance from curve
plt.scatter(data.x, data.y,marker=".",s=20,c=data.dists, vmin=0, vmax=2)
plt.plot(x,y)
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
astrohuman
  • 45
  • 7
  • 1
    You can use `apply` to apply a function to a row at a time, but any operation on a million values is always going to be slow. – Tim Roberts Mar 08 '22 at 05:01
  • Thanks for the suggestion @TimRoberts - I’ll give that a go! I don’t mind a little slow… just trying to do the best I can :) – astrohuman Mar 08 '22 at 05:06
  • 1
    If you have enough memory, this should do the trick to remove the loop problem: https://stackoverflow.com/questions/34502254/vectorizing-haversine-distance-calculation-in-python – John Stud Mar 08 '22 at 05:11
  • @JohnStud: Vectorizing the distance from a (piecewise-defined) curve is rather more involved. – Davis Herring Jan 31 '23 at 01:34

0 Answers0