I have three tables (t1, t2, t3) all of which contain id, lat, lng. There are possible duplicate entries that I want to identify and possibly eliminate (if they are truly duplicates). The issue is that the lat and lng might not be EXACT matches, so the typical count(*) >1 method of finding duplicates will not work since I want to allow for a slight geolocation variance (say 250 feet or about 0.005 degrees in each lat and lng) in any given direction.
I found a similar question here, but it doesn't identify this exact issue and I wasn't able to extrapolate that code for my answer: All Lat Lng of a sql table within 15 Km to each Lat lng of a different table-sql 2008
I did find some code that works to identify EXACT duplicate lat/lng but it doesn't tell me which tables the duplicates are in so I can manually research them or to allow for even the slightest deviation in lat/lng and only returns one id, lat, lng, and count for each duplicate.
There could be "duplicates" that have lat's of 32.333 and 32.336 but the code below will not account for them, as it will see them as unique.
`SELECT id, lat, lng, COUNT( * )
FROM (
SELECT id, lat, lng
FROM t1
UNION ALL
SELECT id, lat, lng
FROM t2
UNION ALL
SELECT id, lat, lng
FROM t3
) tbl
GROUP BY lat, lng
HAVING COUNT( * ) >1
ORDER BY lat;`
Ideally, the output should look something like this:
`t1.id t2.id t3.id
432 1087 <-- found 2 rows within 250 of each other in t1 & t2
12 832 <-- found 2 rows within 250 of each other in t1 & t3
88 654 789<-- found 3 rows within 250 of each other in t1,t2&t3`
An example of "close duplicates" would be:
32.332, -87.556
32.336, -87.560