39

I have a database with four columns corresponding to the geographical coordinates x,y for the start and end position. The columns are:

  • x0
  • y0
  • x1
  • y1

I have an index for these four columns with the sequence x0, y0, x1, y1.

I have a list of about a hundred combination of geographical pairs. How would I go about querying this data efficiently?

I would like to do something like this as suggested on this SO answer but it only works for Oracle database, not MySQL:

SELECT * FROM my_table WHERE (x0, y0, x1, y1) IN ((4, 3, 5, 6), ... ,(9, 3, 2, 1));

I was thinking it might be possible to do something with the index? What would be the best approach (ie: fastest query)? Thanks for your help!

Notes:

  • I cannot change the schema of the database
  • I have about 100'000'000 rows

EDIT: The code as-is was actually working, however it was extremely slow and did not take advantage of the index (as we have an older version of MySQL v5.6.27).

nbeuchat
  • 6,575
  • 5
  • 36
  • 50
  • That should work fine in MySQL, have you tried? The first comment I see that says that in the question you linked was from 5 years ago. – Uueerdo Jun 22 '17 at 17:59
  • 2
    Just so you know, you **can** do this in MySQL. See my test: http://sqlfiddle.com/#!9/7b5c1/1 – sorayadragon Jun 22 '17 at 18:16

5 Answers5

37

To make effective use of the index, we could rewrite the IN predicate

example

(x0, y0, x1, y1) IN ((4, 3, 5, 6),(9, 3, 2, 1))

Like this:

(  ( x0 = 4 AND y0 = 3 AND x1 = 5 AND y1 = 6 ) 
OR ( x0 = 9 AND y0 = 3 AND x1 = 2 AND y1 = 1 )
)

EDIT

Newer versions of MySQL optimizer fix the performance problem; generate execution plans that make more effective use of available indexes.

The (a,b) IN ((7,43),(7,44),(8,1)) syntax has been supported in MySQL many versions back, but there were performance problems with it (at least with with non-trivial sets) because of the suboptimal execution plan generated by the optimizer.

But the optimizer has been improved in newer versions of MySQL; the newer optimizer can generate more efficient execution plans.

Note a similar related problem with OR constructs. Here's an example query intended to get the "next page" of 20 rows ordered by columns seq and sub (unique tuple). The last fetched page (seq,sub)=(7,42)

With much older versions of MySQL, this syntax would not be accepted

 WHERE (seq,sub) > (7,42)
 ORDER BY seq, sub
 LIMIT 20

And when MySQL did support the syntax, we would get an execution plan like if we had written

WHERE ( seq > 7 ) 
   OR ( seq = 7 AND sub > 42 ) 
ORDER BY sub, seq
LIMIT 20

we would get a much more efficient the execution plan if we instead write something subtly different:

WHERE ( seq >= 7 ) 
  AND ( seq > 7 OR sub > 42 ) 
ORDER BY sub, seq
LIMIT 20

and we would get a much better plan from the MySQL optimizer. we'd expect the optimizer plan to use available UNIQUE INDEX on (sub,seq), and return rows in index order from a range scan operation...

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • 2
    Your solution is much faster. With our version of MySQL, A single query (100'000'000 rows, 10 elements in the list) took 3.14 seconds with your solution vs 1427 seconds with the `IN` syntax. – nbeuchat Jun 22 '17 at 19:42
  • The query pattern suggested by @GordonLinoff might be even faster, concatenating results of separate SELECT statements with the `UNION ALL` set operator. Likely the EXPLAIN for that will show queries with "ref" and and "const", rather than "range". That pattern will definitely make use of the index. No guarantee that it will be faster, but it's worth testing. – spencer7593 Jun 22 '17 at 20:06
  • with just a one-off test, the solution by GordonLinoff was slightly slower than yours (3.96 vs 3.14 seconds). It is by far not a rigorous test but at least, both options make use of the indexing. – nbeuchat Jun 22 '17 at 20:24
  • isn't `in` equivalent to `or` where there are indexes or `in` is even better for case of mysql coz there is optimization for `in` in mysql?? – lily Apr 21 '21 at 04:20
  • @lily: use EXPLAIN to see the difference in execution plan, whether an index is being used. Yes these two forms are semantically equivalent `( foo = 1 or foo = 2 )` and `foo in (1,2)`. In this particular example, the MySQL 5.6 optimizer was using a less efficient execution plan for the form using the tuple comparison `(a,b,c) IN ((1,1,1),(2,2,2))`. Theoretically, yes, the optimizer should be able to expand that, into a form like shown in this answer, and then come up with an execution plan that uses the index. NOTE: behavior observed in MySQL 5.6, Bill Karwin notes this was fixed in 5.7 – spencer7593 Apr 26 '21 at 13:55
  • 1
    why mysql can't optimize itself? – Alex78191 Feb 07 '22 at 19:18
  • 1
    @Alex78191 i expect this^ is fixed in newer versions of MySQL (and/or MariaDB). this^= mysql optimizer generates a different, suboptimal plan for `(x,y) IN ((2,11),(3,13),(5,17))` compared to semantically equivalent `((x=3 and y=11) OR (x=3 and y=13) OR (x=5 and y=17))` – spencer7593 Feb 08 '22 at 16:33
  • in is fixed in mysql 8 – gayavat Nov 29 '22 at 20:16
  • 1
    in is fixed in mysql 5.7 https://stackoverflow.com/questions/40489417/using-tuple-comparison-in-mysql-is-it-efficient – gayavat Nov 29 '22 at 20:24
21

I do not understand your point. The following query is valid MySQL syntax:

SELECT *
FROM my_table
WHERE (x0, y0, x1, y1) IN ((4, 3, 5, 6), ... ,(9, 3, 2, 1));

I would expect MySQL to use the composite index that you have described. But, if it doesn't you could do:

SELECT *
FROM my_table
WHERE x0 = 4 AND y0 = 3 AND x1 = 5 AND y1 = 6
UNION ALL
. . .
SELECT *
FROM my_table
WHERE x0 = 9 AND y0 = 3 AND x1 = 2 AND y1 = 1

The equality comparisons in the WHERE clause will take advantage of an index.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Indeed, it was a valid syntax but was taking howfully long to execute. It seems that the version of MySQL that we are using does not take advantage of the index. – nbeuchat Jun 22 '17 at 19:03
16

MySQL allows row constructor comparisons like you show, but the optimizer didn't know how to use an index to help performance until MySQL 5.7.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    I see, we have an older version of MySQL which made everything very slow. The code was actually working but very slow. @spencer answer fixed it. – nbeuchat Jun 22 '17 at 19:06
2

You can concatenate the four values into a string and check them like that:

SELECT * 
FROM my_table 
WHERE CONCAT_WS(',', x0, y0, x1, y1) IN ('4,3,5,6', ..., '9,3,2,1');
sorayadragon
  • 1,087
  • 7
  • 21
  • 2
    MySQL will need to evaluate the CONCAT_WS function for *every* row in the table. That might make use of an index, but it would be a full scan of the index, all 100,000,000 rows. – spencer7593 Jun 22 '17 at 18:00
0

The way you are doing is giving correct results in the mysql version on my machine. I am using v5.5.55. Maybe you are using an older one. Please check that.

If you still want to solve this problem in your own version or the above mentioned solution doesn't work then only read the next solution.

I am still not clear about data types and range of all your columns here. So I am assuming that data type is integer and range is between 0 to 9. If this is the case you can easily do this as given below.

select * from s1 where x0+10*x1+100*y1+1000*y2 in (4356,..., 9321);
Chandan Purbia
  • 285
  • 4
  • 14
  • With this approach, MySQL will not be able to use a range scan operation on an index `(x0,x1,y1,y2)`. MySQL will evaluate that expression in the where clause for *every* one of the 100,000,000 rows in the table. – spencer7593 Nov 22 '17 at 14:34