1

I have a query formed by an UNION ALL from two tables. The results have to be ordered and paginated (like the typical list of a web application).

The original query (simplified) is:

SELECT name, id
FROM _test1 -- conditions WHERE
UNION ALL
SELECT name, id
FROM _test2 -- conditions WHERE
ORDER BY name DESC LIMIT 10,20

The problem is that the 2 tables have more than 1 million rows each, and the query is very slow.

How can I get an optimized paginated list from a UNION ALL?

Postdata:

I've used the search of Stack Overflow and I've found some similar questions of this, but the answer was incorrect or the question isn't exactly the same. Two examples:

Optimize a UNION mysql query

Combining UNION and LIMIT operations in MySQL query

I'm surprised that in Stack Overflow nobody could answered this question. Maybe it is impossible to do this query more efficiently? What could be a solution of this problem?

Community
  • 1
  • 1
Dr. No
  • 1,306
  • 5
  • 28
  • 57
  • I suspect the problem isn't the `UNION ALL` but rather the `ORDER BY`. – NPE Sep 27 '11 at 18:40
  • A simplified example is useful only if the simplified example can reproduce the problem. Are you able to see the problem with the above two tables, with no where clause and an index on name in both the tables? – Miserable Variable Sep 27 '11 at 18:54
  • 1
    If you have 2 tables and you need a query like this, then it looks more like a design problem. For `LIMIT 10,20` it won't be hard to optimize but for `LIMIT 5000,20` it will be quite hard, if not impossible. – ypercubeᵀᴹ Sep 27 '11 at 18:55
  • The "commented where" is a way to say that maybe could be conditions or maybe not (there are some input fields for search in the list) but both are slow (Maybe I not explained well that). The problem (I think) is the huge amount of data that generate the union, that have to be ordered and at last apply the LIMIT. – Dr. No Sep 27 '11 at 19:27
  • @ypercube You're right. The problem is a "legacy database" that I can't change the structure. – Dr. No Sep 27 '11 at 19:29
  • @aix Yeah, using the Explain Plan, the instruction that cause the most of CPU consumption is the ORDER BY, but I think it's caused by the huge amount of data generated by the UNION ALL – Dr. No Sep 27 '11 at 19:32

1 Answers1

4

I would think that you could use something similar to the solution in your second link to at least help performance, but I doubt that you'll be able to get great performance on later pages. For example:

(   SELECT name, id
    FROM _test1 -- conditions WHERE
    ORDER BY name DESC LIMIT 0, 30
)
UNION ALL
(   SELECT name, id
    FROM _test2 -- conditions WHERE
   ORDER BY name DESC LIMIT 0, 30
)
ORDER BY name DESC
LIMIT 10, 20

You're basically limiting each subquery to the subset of possible rows that might be on the given page. In this way you only need to retrieve and merge 20 rows from each table before determining which 10 to return. Otherwise the server will potentially grab all of the rows from each table, order and merge them, then start trying to find the correct rows.

I don't use MySQL a lot though, so I can't guarantee that the engine will behave how I think it should :)

In any event, once you get to later pages you're still going to be merging larger and larger datasets. HOWEVER, I am of the strong opinion that a UI should NEVER allow a user to retrieve a set of records that let them go to (for example) page 5000. That's simply too much data for a human mind to find useful all at once and should require further filtering. Maybe let them see the first 100 pages (or some other number), but otherwise they have to constrain the results better. Just my opinion though.

Tom H
  • 46,766
  • 14
  • 87
  • 128