1

My table coordinates is:

id (PK) SERIAL, address (VARCHAR), city (VARCHAR), latitude (VARCHAR), longitude (VARCHAR)

I have a list of 10,000 addresses which I'm looping through to retrieve the corresponding latitude/longitude if the address exists in the table.

The SELECT query looks like:

SELECT ADDRESS, CITY, LATITUDE, LONGITUDE FROM coordinates WHERE address = ? AND city = ?

I then check the result set to see if there was a match. The other 9,999 addresses are then looped through the query above which takes far too long.

Is there a way to create ONE query that contains all 10,000 addresses and returns a result set with 4 columns: |address|city|latitude|longitude|

Any address not found should have an empty (0 or null) lat/lon value in that result set column. I'm assuming I could then loop the result set to identify anything missed, much faster than 10,000 statements.

What's the best approach?

Mathomatic
  • 899
  • 1
  • 13
  • 38
  • Why do you do both `=` and `LIKE`? The condition `x LIKE '%FOO%'` includes all records where `x = 'FOO'` would be true. – Andreas Nov 16 '18 at 22:31
  • *"I'm performing 10,000 SELECT queries"* But you also said *"here's 1/4 of it"* and showed a statement with 3 SELECT clauses, so if 3 is 1/4, then you're doing 12 SELECT queries, not 10,000. – Andreas Nov 16 '18 at 22:32
  • No, you're misunderstanding which is likely my fault. I'll reword it: the `SELECT` query with `UNION` involved is 12 queries.... but it's purpose is to find ONE address/coordinate match in my table. The 12 queries account for the variance in addresses like I explained. I have `10,000 addresses` each which will pass through the 12-query `SELECT/UNION` block. Understand? – Mathomatic Nov 16 '18 at 22:34
  • So, instead of looping all 10,000 addresses and individually putting their address/city into the SELECT/UNION query... I want ONE query that will include all 10,000 addresses so it's much faster. – Mathomatic Nov 16 '18 at 22:37

2 Answers2

1

UPDATE: Changed to include unmatched "query" addresses in the result, and added QUERY_ID to help identify "query" addresses, which could be just an index in the list supplying the query parameters.

Since it is PostgreSQL, you can use the VALUES clause, e.g.

SELECT q.QUERY_ID
     , c.ADDRESS
     , c.CITY
     , c.LATITUDE
     , c.LONGITUDE
  FROM (VALUES (1, ?, ?)
             , (2, ?, ?) // repeat as many times as needed
             , (3, ?, ?)
       ) AS q (QUERY_ID, ADDRESS, CITY)
  LEFT JOIN coordinates AS c
       ON  c.ADDRESS LIKE q.ADDRESS
       AND c.CITY LIKE q.CITY

Now you need to loop through your 10000 addresses and set all the values for that PreparedStatement.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • thank you, this looks promising. Would you mind explaining what this query is doing, as I'm confused what `t` and `c` are. Also, unmatched addresses must be included in the result set as empty lat/lon so I can identify which ones aren't matched. – Mathomatic Nov 17 '18 at 00:38
  • This works, and is certainly faster. Thanks a lot Andreas – Mathomatic Nov 18 '18 at 01:57
1

The query length might be an issue for 10,000 pairs. Also, if you absolutely need to do it in one hit, you can try this.

Insert the values into a temp table. Then:

SELECT tbl.val1, tbl.val2
FROM tbl
WHERE (tbl.val1, tbl.val2) in (select tmp_table.val1, tmp_table.val2 from tmp_table);

Also alternatively a JOIN, instead of IN.

Update:

(1) Inserts would have to be one per line. Probably create a script file with some COMMIT; statements once in few hundred rows. If it is done from Java you could use JDBC addBatch to commit once in 500/1000 rows.

--Create temp table before this with two columns address (VARCHAR), city (VARCHAR)
INSERT INTO TMP_TABLE VALUES ( ?, ?); //Prepare statement and addBatch

(2, 3, 4)

SELECT ADDRESS, CITY, LATITUDE, LONGITUDE
FROM coordinates
WHERE (ADDRESS, CITY) in (select tmp_table.address, tmp_table.city from tmp_table);
Teddy
  • 4,009
  • 2
  • 33
  • 55
  • Thanks, I don't *need* to do it in one query if there's an alternative approach that's faster. Your answer leaves me wondering (1) what the `insert to temp table` code looks like (2) what your code is actually doing. (3) what val1 and val2 correspond to in my question. (4) My query response needs 4 columns (address, city, lon, lat) not just 2 (val1, val2). – Mathomatic Nov 17 '18 at 00:43
  • @Andreas is loading the whole 10,000 into a virtual table right within the query and doing a join. If that works, it might be the better option. Even otherwise, you can try his approach 500 or 1000 at a time. But, huge queries can become unpredictable as the length of the string can vary. My method is relatively safer, but it might be slower. But, it still would be faster than 10,000 individual select queries. So, kind of a mid-performance approach. – Teddy Nov 17 '18 at 00:54
  • Thanks. Why would length of the string varying matter? If I use a StringBuilder to build that prepared statement, and then set the appropriate number of parameters, I don't see where string length comes into play. – Mathomatic Nov 17 '18 at 01:04
  • 1
    A 1MB query string, and then a PreparedStatement should be just fine. A rare chance that one fine day it might throw an OutOfMemoryError. If you batch it by row count, and each row can vary in size, the query will vary in size. 1000 rows could be 10KB in one run or could be 50KB in the next run. – Teddy Nov 17 '18 at 01:18