0

The mySQL database I use currently has two tables: One called source and the other called siteinfo.

The siteinfo table is as shown:

+-----------+----------+----------+
| longitude | latitude | sitecode |
+-----------+----------+----------+
|    1.3009 | 1.900989 |        2 |
+-----------+----------+----------+
|    1.7034 | 1.20034  |        3 |
+-----------+----------+----------+

likewise, the source table is as shown:

+-----------+----------+----------+
| longitude | latitude | sitecode |
+-----------+----------+----------+
|    1.3009 | 1.900989 |   NULL   |
+-----------+----------+----------+

What I basically aim to do is:

For each row in the source table, I want to take the columns called Longitude and Latitude and compare these to the corresponding Longitude and Latitude columns in the siteinfo table.

Those rows in the Source table where the Long and Lat rows are most similar to those in the siteinfo table then have the sitecode column updated with the corresponding sitecode of the sitecode column in the siteinfo table.

For example, the long and lat values in the source table are closest in value to those in the first row of the siteinfo table and the sitecode is thus updated to 2.

To tackle this, I know I have to create two ResultSets and use while(next()) to iterate through each row of each resultset. Question is, how do I compare to find the most similar entries?

I thought of doing it brute force, using a For loop to compare say, the first row entries of the Source Resultset with every single row of the Siteinfo Resultset then increment to the second row.

However, I understand resultset work by next() function to iterate each row, which isn't the same as a for loop since I want to compare the FIRST row of one resultset against every row of another.

Second, how do I find the difference of a column entry of two result tables?

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
Mack
  • 117
  • 2
  • 9
  • 2
    You should probably look if MySQL has support for spatial data types and queries. Then you could just do it with a single query to the server, instead of combining clumsy Java code to do what you want. – Kayaman Sep 21 '17 at 10:25

1 Answers1

1

No, you don't have to create two ResultSets. You can just create a MySQL function to calculate the Great Circle Distance between two points ...

CREATE FUNCTION great_circle_km (lon1 DOUBLE, lat1 DOUBLE, lon2 DOUBLE, lat2 DOUBLE)
RETURNS DOUBLE
RETURN 6371 * acos(cos(radians(lat1)) * cos(radians(lat2)) * cos(radians(lon2) - radians(lon1)) + sin(radians(lat1)) * sin(radians(lat2)))

... and then use an UPDATE query like so

UPDATE `source` SET `source`.`sitecode` = (
    SELECT `sitecode` FROM `siteinfo`
    ORDER BY great_circle_km(`source`.`longitude`, `source`.`latitude`, `siteinfo`.`longitude`, `siteinfo`.`latitude`)
    LIMIT 1
    )

Note that the above query will perform the equivalent of a CROSS JOIN (cartesian product) so the number of effective rows processed will increase dramatically for larger source tables. For example, using two 500-row tables will effectively process 250,000 rows, and using two 5,000-row tables will effectively process 25,000,000 rows.

If there could be a significant number of exact matches between the tables then it would probably be faster to update them first ...

UPDATE 
    `source` 
    INNER JOIN 
    `siteinfo` 
        ON `source`.`longitude` = `siteinfo`.`longitude` 
            AND `source`.`latitude` = `siteinfo`.`latitude`
SET `source`.`sitecode` = `siteinfo`.`sitecode`

... and then update the remaining rows

UPDATE `source` SET `source`.`sitecode` = (
    SELECT `sitecode` FROM `siteinfo`
    ORDER BY great_circle_km(`source`.`longitude`, `source`.`latitude`, `siteinfo`.`longitude`, `siteinfo`.`latitude`)
    LIMIT 1
    )
WHERE `source`.`sitecode` IS NULL
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • Worked perfectly and so much more efficient! – Mack Sep 22 '17 at 02:27
  • By the way, i found that analysis of larger tables(5000 rows) took quite a long time(about 2 minutes), any suggestions on improving the time efficiency? – Mack Sep 22 '17 at 04:08