I have a program in which I need to find the distance of points from a curve. The curve is a complicated shape and I have 10^6 points in the full version. I have a successful program using shapely to select the points but it is too slow to run on the full version. I have made a toy version here, but the general idea is the same.
Is it possible to remove the loop?! How can I make it faster?
import numpy as n
import shapely.geometry as geom
import matplotlib.pyplot as plt
import random
import pandas as pd
#make some random coordinates
coords=n.random.rand(100,2)*10.0
data = {'x':coords[:,0],
'y':coords[:,1]}
data = pd.DataFrame(data)
#make a random curve
x=[0,1,2,3,4,7,10]
y=[0,0.5,5,8,9,5,10]
line_in=zip(x,y)
line=geom.LineString(line_in)
#find the distance from the curve for each point and add to the dataframe
data["dists"]=n.nan
for i, j in data.iterrows():
point = geom.Point(data.x[i],data.y[i])
data.dists[i]=line.distance(geom.Point(point))
#visualize the output coloured according to distance from curve
plt.scatter(data.x, data.y,marker=".",s=20,c=data.dists, vmin=0, vmax=2)
plt.plot(x,y)