1

Table structure:

CREATE TABLE `mytable` (
  `id` varchar(8) NOT NULL,
  `event` varchar(32) NOT NULL,
  `event_date` date NOT NULL,
  `event_time` time NOT NULL,
  KEY `id` (`id`) 
) ENGINE=MyISAM DEFAULT CHARSET=utf8

The data in this table looks like this:

 id      | event      | event_date  | event_time
---------+------------+-------------+-------------
ref1     | someevent1 | 2010-01-01  | 01:23:45
ref1     | someevent2 | 2010-01-01  | 02:34:54
ref1     | someevent3 | 2010-01-18  | 01:23:45
ref2     | someevent4 | 2012-10-05  | 22:23:21
ref2     | someevent5 | 2012-11-21  | 11:22:33

The table contains about 500.000.000 records similar to this.

The query I'd like to ask about here looks like this:

SELECT     *
FROM       `mytable`
WHERE      `id` = 'ref1'
ORDER BY   event_date DESC,
           event_time DESC
LIMIT      0, 500

The EXPLAIN output looks like:

select_type:   SIMPLE
table:         E
type:          ref
possible_keys: id
key:           id
key_len:       27
ref:           const     
rows:          17024 (a common example)
Extra:         Using where; Using filesort

Purpose: This query is generated by a website, the LIMIT-values are for page navigation element, so if the user wants to see older entries, they'll get adjusted to 500, 500, then 1000, 500 and so on.

Since some items in the field id can be set in quite a lot of rows, more and more rows will of course lead to a slower query. Profiling those slow queries showed me the reason is the sorting, most of the time during the query the mysql server is busy sorting the data. Indexing the fields event_date and event_time didn't change that very much.

Example SHOW PROFILE Result, sorted by duration:

state          | duration/sec | percentage
---------------|--------------|-----------
Sorting result |     12.00145 |   99.80640
Sending data   |      0.01978 |    0.16449
statistics     |      0.00289 |    0.02403
freeing items  |      0.00028 |    0.00233
...
Total          |     12.02473 |  100.00000

Now the question:

Before delving way deeper into the mysql variables like sort_buffer_size and other server configuration option, can you think of any way to change the query or the sorting behaviour so sorting ain't that big performance eater anymore and the purpose of this query is still in place?

I don't mind a bit of out-of-the-box-thinking.

Thank you in advance!

Bjoern
  • 15,934
  • 4
  • 43
  • 48
  • Have you tried setting indexes on `event_date` and `event_time`? – Martin Nov 09 '12 at 13:18
  • @Martin Yes, I have. As described, this didn't change the behaviour very much. – Bjoern Nov 09 '12 at 13:20
  • Try to add multi-column index (id, event_date desc, event_time desc). – sufleR Nov 09 '12 at 13:25
  • if you want to improve your query don't use `select *`, it will be faster if you use `select colm1,col2,...,colmN`(all the columns) – jcho360 Nov 09 '12 at 13:26
  • @jcho360 That would not change anything to the fact that the sorting is slow. – Martin Nov 09 '12 at 13:27
  • @Martin are you sure?,I said how to improve it a little at least http://stackoverflow.com/questions/65512/which-is-faster-best-select-or-select-column1-colum2-column3-etc – jcho360 Nov 09 '12 at 13:28
  • @jcho360 @Martin The real query actually returns just the columns I want to, I've just posted `SELECT *` to make things easier here. It doesn't affect the sorting very much - at least not in my case. – Bjoern Nov 09 '12 at 13:34
  • Ah.. okay. Tried a threeway index already? – Martin Nov 09 '12 at 13:34
  • @Martin working in this atm, it just takes ages to build a substancial amout of test data. – Bjoern Nov 09 '12 at 13:40
  • @Bjoern I can tell.. Good luck in either way! – Martin Nov 09 '12 at 13:41

3 Answers3

2

As I wrote in comment multi-column index (id, evet_date desc, event_time desc) may help.

If this table will grow fast you should consider to adding option in application for user to select data for particular date range.

Example: First step always return 500 records but to select next records user should set date range for data and then set pagination.

sufleR
  • 2,865
  • 17
  • 31
  • Thank you for this suggestion, I'll try something like this, if it cannot be solved otherwise. – Bjoern Nov 09 '12 at 13:53
1

Indexing is most likely the solution; you just have to do it right. See the mysql reference page for this.

The most effective way to do it is to create a three-part index on (id, event_date, event_time). You can specify event_date desc, event_time desc in the index, but I don't think it's necessary.

histocrat
  • 2,291
  • 12
  • 21
  • I've played around with the indexing a bit, but haven't really archieved any success so far. I haven't yet tried the three-part index, I'll try that. Thanks for the suggestion. – Bjoern Nov 09 '12 at 13:38
  • AFter two days of playing around with the different approaches posted here, this one was the answer best fitting to my problem at hand. Thank you very much! – Bjoern Nov 11 '12 at 21:36
1

I would start by doing what sufleR suggests - the multi-column index on (id, event_date desc, event_time desc).

However, according to http://dev.mysql.com/doc/refman/5.0/en/create-index.html, the DESC keyword is supported, but doesn't actually do anything. That's a bit of a pain - so try it, and see if it improves the performance, but it probably won't.

If that's the case, you may have to cheat by creating a "sort_column", with an automatically decrementing value (pretty sure you'd have to do this in the application layer, I don't think you can decrement in MySQL), and add that column to the index.

You'd end up with:

id      | event      | event_date  | event_time  | sort_value
---------+------------+-------------+-------------------------
ref1     | someevent1 | 2010-01-01  | 01:23:45   | 0
ref1     | someevent2 | 2010-01-01  | 02:34:54   | -1
ref1     | someevent3 | 2010-01-18  | 01:23:45   | -2
ref2     | someevent4 | 2012-10-05  | 22:23:21   | -3
ref2     | someevent5 | 2012-11-21  | 11:22:33   | -4

and and index on ID and sort_value.

Dirty, but the only other suggestion is to reduce the number of records matching the where clause in other ways - for instance, by changing the interface not to return 500 records, but records for a given date.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • I'll start with trying the three-column index. If this doesn't work, I'll go for the sort_value. Thank you for this suggestion. – Bjoern Nov 09 '12 at 13:39
  • The multi-column index was indeed the best option in my setup. Thanks for the other suggestion anyway! – Bjoern Nov 11 '12 at 21:37