I initially implemented the Hoey-Shamos algorithm, however it is too complex for future maintainability (I have no say in this), and it wasn't reporting correctly, so an optimized brute force algorithm is what I'm going to use.
My question is: How can I optimize this code to be usable?
As it stands, my code contains a nested for loop, iterating the same list twice.
EDIT: Turned lines into a HashSet and used two foreach loops... shaved about 45 seconds off scanning 10,000. It's still not enough.
foreach (Line2D g in lines)
{
foreach (Line2D h in lines)
{
if (g.intersectsLine(h))
{
return false;
}
}
} // end 'lines' for each loop
If I force my "intersectsLine()" method to return false (for testing purposes) it still takes 8 seconds to scan 10,000 records (I have 700,000 records). This is too long, so I need to optimize this piece of code.
I tried removing lines from the List after it's been compared to all the other lines, however there's an accuracy issue (no idea why) and the speed increase is barely noticeable.
Here is my intersectsLine method. I found an alternate approach here but it looks like it'd be slower with all the method calls and whatnot. Calculating the slope doesn't seem to me like it'd take too much computing (Correct me if I'm wrong?)
public bool intersectsLine(Line2D comparedLine)
{
//tweakLine(comparedLine);
if (this.Equals(comparedLine) ||
P2.Equals(comparedLine.P1) ||
P1.Equals(comparedLine.P2))
{
return false;
}
double firstLineSlopeX, firstLineSlopeY, secondLineSlopeX, secondLineSlopeY;
firstLineSlopeX = X2 - X1;
firstLineSlopeY = Y2 - Y1;
secondLineSlopeX = comparedLine.X2 - comparedLine.X1;
secondLineSlopeY = comparedLine.Y2 - comparedLine.Y1;
double s, t;
s = (-firstLineSlopeY * (X1 - comparedLine.X1) + firstLineSlopeX * (Y1 - comparedLine.Y1)) / (-secondLineSlopeX * firstLineSlopeY + firstLineSlopeX * secondLineSlopeY);
t = (secondLineSlopeX * (Y1 - comparedLine.Y1) - secondLineSlopeY * (X1 - comparedLine.X1)) / (-secondLineSlopeX * firstLineSlopeY + firstLineSlopeX * secondLineSlopeY);
if (s >= 0 && s <= 1 && t >= 0 && t <= 1)
{
//console.WriteLine("s = {0}, t = {1}", s, t);
//console.WriteLine("this: " + this);
//console.WriteLine("other: " + comparedLine);
return true;
}
return false; // No collision */
}
EDIT: The major bottleneck is the big polygons! I excluded running polygons with more than 100 lines, and it ran all 700,000+ polygons in 5:10. Which is in the acceptable range! Surely there's a way to see if a polygon is worth calculating before running this code? I have access to the XMin, Xmax, YMin, and YMax properties if that helps?
Ran another test, limiting polygons to under 1000 lines each. It took a little over an hour.
I removed all limiting of polygons, and it's been running for 36 hours now... still no results.
A couple ideas I have:
-When I generate my lines hashset, have another hashset/List that has candidates for intersection. I would only add lines to this list if there's a chance for intersection. But how do I eliminate/add possibilities? If there's only three lines that could possibly intersect with others, I'd be comparing 4000 lines against 3 instead of 4000. This alone could make a HUGE difference.
-If the same point occurs twice, except the first and last point, omit running the nested for loop.
Edit:
Information about the polygons: 700,000 total
There is over four thousand polygons with 1,000 or more points
There is 2 polygons with 70,000 or more points
I think it's possible to get this down to fifteen or so minutes with a bit of creativity.