Efficiently caching and paginating large MySQL resultset in PHP

Question

I have a distributed PHP5 application that allows users to search and paginate through MySQL data. Sometimes, their queries can contain hundreds of thousands of records and these are always sorted by timestamp, so the volume of records and the order-by are hits on performance as-is.

I need to make pagination more efficient. Long ago I ditched SQL_CALC_FOUND_ROWS in favor of a COUNT(*) but this means the query still runs twice, once with the limit and once without.

I could combine the two queries and rely on PHP to paginate the result set instead of using a LIMIT clause. Update: This isn't possible because the large dataset I'm trying to work with would exceed the memory limits set for php. Since this is distributed I can't rely on everyone to change their settings.

However, I'd really like to cache the results and paginate that way - I'd be much faster than having to query MySQL again. Because it's a distibuted application I won't have any caching system I'd normally use like Memcached, etc. Session storage would likely be too small.

Are there any alternatives? Is there a better way to paginate the results or a safe way to cache the results and let PHP paginate them?

score 0 · Answer 1 · answered Apr 04 '13 at 02:56

0

Do you need to show the exact number of results? You could always tell your users "Found more than 1000 results". Also when you do count(*) it might not necessarily look through all the records in your table, it uses indexes on your table for efficient counting.

If your table is humongous, you can always use the SHOW TABLE STATUS query to find information about your table. If it's a MyISAM table, this command will give you the exact number of rows, but if it's an InnoDB, it might not be as accurate.

Hope this helps

answered Apr 04 '13 at 02:56

Reza S

9,480
3
54
84

Thanks, unfortunately it doesn't though. Users are running reports and are querying a very wide variety of the data. Users typically have anywhere from a few hundred thousand records, to 100 million. It's hard to know what data they're going to query so even setting up indexes is difficult. We use the total number of results in calculating how many pages there are, so they can paginate. For now, I just cache the count until the search conditions change so we don't re-calc that every page change. – helion3 Apr 04 '13 at 03:07
If it's 100 million rows, and you're at most showing 100 rows per page, does it matter to the user to say "showing page 1/1000000" or showing page "1/1000001"? i doubt it! You might as well say "showing page 1 out of many pages" – Reza S Apr 04 '13 at 03:21
Then how would you recommend showing the page number for the last set of results. They need to be able to quickly jump to the end – helion3 Apr 04 '13 at 03:22
You have to find out how users are using your system, if jumping to the end is all they need to do, then why don't you sort your records descending? that way they'll only be dealing with the first few pages. If they need to jump around from anywhere between page 1 and page 1000000, you have a serious design flaw and need to provide better filters to narrow down your search result. – Reza S Apr 04 '13 at 03:24
Or you could even replace "page 1000001" with "last page", users don't care how many pages you have, and once they click last page, then grab the last 100 rows. Another option: http://stackoverflow.com/questions/1243952/how-can-i-speed-up-a-mysql-query-with-a-large-offset-in-the-limit-clause – Reza S Apr 04 '13 at 03:28
I know extremely well how they use our system. User's have complete control over the queries and filters and many of them aren't keeping the filters narrow. Their fault, sure - but it still makes them assume the application is slow. We give them to choice of sorting, but ORDER BY on a several hundred thousand rows is always going to be a performance kill. We make our users very well aware of this but they do it anyway. I'm trying to accommodate as many people as possible. Don't start accusing me of design flaws when you have no insight into our users or the application. – helion3 Apr 04 '13 at 03:34
Didn't mean to offend. carry on – Reza S Apr 04 '13 at 03:50

Efficiently caching and paginating large MySQL resultset in PHP

1 Answers1