1

I have three tables (t1, t2, t3) all of which contain id, lat, lng. There are possible duplicate entries that I want to identify and possibly eliminate (if they are truly duplicates). The issue is that the lat and lng might not be EXACT matches, so the typical count(*) >1 method of finding duplicates will not work since I want to allow for a slight geolocation variance (say 250 feet or about 0.005 degrees in each lat and lng) in any given direction.

I found a similar question here, but it doesn't identify this exact issue and I wasn't able to extrapolate that code for my answer: All Lat Lng of a sql table within 15 Km to each Lat lng of a different table-sql 2008

I did find some code that works to identify EXACT duplicate lat/lng but it doesn't tell me which tables the duplicates are in so I can manually research them or to allow for even the slightest deviation in lat/lng and only returns one id, lat, lng, and count for each duplicate.

There could be "duplicates" that have lat's of 32.333 and 32.336 but the code below will not account for them, as it will see them as unique.

`SELECT id, lat, lng, COUNT( * ) 
    FROM (
        SELECT id, lat, lng
        FROM t1
    UNION ALL 
        SELECT id, lat, lng
        FROM t2
    UNION ALL 
        SELECT id, lat, lng
        FROM t3
    ) tbl
GROUP BY lat, lng
HAVING COUNT( * ) >1
ORDER BY lat;`

Ideally, the output should look something like this:

`t1.id  t2.id   t3.id
432     1087        <-- found 2 rows within 250 of each other in t1 & t2
12      832         <-- found 2 rows within 250 of each other in t1 & t3
88      654      789<-- found 3 rows within 250 of each other in t1,t2&t3`

An example of "close duplicates" would be:
32.332, -87.556
32.336, -87.560
Achraf Almouloudi
  • 756
  • 10
  • 27
SgoDiv
  • 11
  • 3
  • Could you work it out youself if you know how to [find distance between two points using latitude and longitude in mysql](https://stackoverflow.com/questions/24370975/find-distance-between-two-points-using-latitude-and-longitude-in-mysql)? – KIKO Software Apr 14 '19 at 05:40
  • I think that is a step in the right direction, and it would be easy if I was trying to compare them to a known location. But I want to compare them to each other. **Similar** to a "count(*)>1" method of finding exact duplicates. Thanks for the suggestion! – SgoDiv Apr 14 '19 at 15:11
  • [This is along the same lines](https://stackoverflow.com/questions/36834277/all-lat-lng-of-a-sql-table-within-15-km-to-each-lat-lng-of-a-different-table-sql?noredirect=1&lq=1) – SgoDiv Apr 14 '19 at 15:12
  • By adding a constant to the union all statements I am able to at least identify the table for each row: `SELECT id, lat, lng, COUNT( * ) FROM ( SELECT 't1' AS table_name, id, lat, lng FROM t1 UNION ALL SELECT 't2' AS table_name, id, lat, lng FROM t2 UNION ALL SELECT 't3' AS table_name, id, lat, lng FROM t3 )tbl GROUP BY lat, lng HAVING COUNT( * ) >1 ORDER BY id` – SgoDiv Apr 14 '19 at 17:23

0 Answers0