I have a table containing about 500 points and am looking for duplicates within a tolerance. This takes less than a second and gives me 500 rows. Most have a distance of zero because it gives the same point (PointA = PointB)
DECLARE @TOL AS REAL
SET @TOL = 0.05
SELECT
PointA.ObjectId as ObjectIDa,
PointA.Name as PTNameA,
PointA.[Description] as PTdescA,
PointB.ObjectId as ObjectIDb,
PointB.Name as PTNameB,
PointB.[Description] as PTdescB,
ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
JOIN [CadData].Survey.SurveyPoint PointB
ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
-- AND
-- PointA.ObjectId <> PointB.ObjectID
ORDER BY ObjectIDa
If I use the commented out lines near the bottom, I get 14 rows but the execution time goes up to 14 seconds. Not that big a deal until my point table expands to 10's of thousands.
I apologize in advance if the answer is already out there. I did look, but being new I get lost reading posts which are way over my head.
ADDENDUM: ObjectID is a bigint and the PK for the table, so I realized that I could change the statement to
AND PointA.ObjectID > PointB.ObjectID
This now takes half the time and gives me half the results (7 rows in 7 seconds). I now don't get duplicates (as in Point 4 is close to Point 8 followed by Point 8 is close to Point 4). However the performance still concerns me as the table will be very large, so any performance issues will become problems.
ADDENDUM 2: Changing the order of the JOIN and AND (or WHERE as suggested) as below makes no difference either.
DECLARE @TOL AS REAL
SET @TOL = 0.05
SELECT
PointA.ObjectId as ObjectIDa,
PointA.Name as PTNameA,
PointA.[Description] as PTdescA,
PointB.ObjectId as ObjectIDb,
PointB.Name as PTNameB,
PointB.[Description] as PTdescB,
ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
JOIN [CadData].Survey.SurveyPoint PointB
ON PointA.ObjectId < PointB.ObjectID
WHERE
PointA.Geometry.STDistance(PointB.Geometry) < @TOL
ORDER BY ObjectIDa
I find it fascinating that I can change the @Tol value to something large that returns over 100 rows with no change in performance even though it requires many computations. But then adding a simple A