2

I have a (big) table where I do a query on, using 3 fields in the WHERE. One of these fields has an index (a date), and I'm looking for hits in the past 3 months. While it'll never be a quick query, the least I hope for is the index on this date to be used.

This is my query:

SELECT id
FROM statsTable
WHERE 1
   AND ip            =  'ipgoeshere'
   AND anotherstring =  'Quite a long string goes here, something like this or even longer'
   AND `date`        >  DATE_ADD( NOW( ) , INTERVAL -3 MONTH ) 

And it's explain:

id  select_type table       type    possible_keys   key     key_len ref     rows    Extra
1   SIMPLE      statsTable  ALL     date            NULL    NULL    NULL    4833721 Using where; Using filesort   

This is a complete table-scan, the number of rows is off because of INNODB-row-counting I guess, but that's all of em. This takes about 30 seconds.

If I force the index like so, I get the expected result:

SELECT id
FROM statsTable FORCE INDEX (date)
WHERE 1
   AND ip            =  'ipgoeshere'
   AND anotherstring =  'Quite a long string goes here, something like this or even longer'
   AND `date`        >  DATE_ADD( NOW( ) , INTERVAL -3 MONTH ) 

Again, the explain:

id  select_type table       type    possible_keys   key     key_len ref     rows    Extra
1   SIMPLE      statsTable  range   date            date    8       NULL    1120172 Using where

Now we have 'only' a million results, but this gets done "lighting" quick (as in, 3 seconds instead of 30).

The table:

CREATE TABLE IF NOT EXISTS `statsTable` (

  `id`            int(11) unsigned NOT NULL AUTO_INCREMENT,
  `date`          datetime NOT NULL,
  `ip`            varchar(15) NOT NULL,
  `anotherstring` varchar(255) NOT NULL,

  PRIMARY KEY (`id`),
  KEY `date` (`date`)

) ENGINE=InnoDB;

The strange thing is: I have this table running on another database too (running on a different server), and the index IS being used in that instance. I can't see what could be the issue here. Is there a setting I missed? Or could it be some other minor difference? Apart from the differences, I can't see why above query wouldn't use the key.

I have run OPTIMIZE TABLE and, as @DhruvPathak suggested ANALYZE TABLE, but the explain still stays the same. I also tried an ALTER TABLE as suggested by a friend, to rebuild the index. No luck.

Nanne
  • 64,065
  • 16
  • 119
  • 163

3 Answers3

1

Run ANALYZE TABLE once, and see if that helps in correcting the choice of the optimizer.

http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

This will also help : MySQL not using indexes with WHERE IN clause?

Can you try editing your query ?

Why is there a reduntant TRUE condition WHERE 1 in the query ?

Change

SELECT id
FROM statsTable
WHERE 1
   AND ip            =  'ipgoeshere'
   AND anotherstring =  'Quite a long string goes here, something like this or even longer'
   AND `date`        >  DATE_ADD( NOW( ) , INTERVAL -3 MONTH ) 

To

SELECT id
FROM statsTable
where  `date`        >  DATE_ADD( NOW( ) , INTERVAL -3 MONTH ) 
AND ip            =  'ipgoeshere'
AND anotherstring =  'Quite a long string goes here, something like this or even longer'
Community
  • 1
  • 1
DhruvPathak
  • 42,059
  • 16
  • 116
  • 175
  • I think I did run `OPTIMIZE` at some point, but haven't run `ANALYZE`. I'll try that now. That other question mentions "small tables", I don't think this is one of those :P. But the growth of 10/15% has been there obviously :) – Nanne Sep 01 '11 at 06:57
  • AN `analyze` did not help sadly :( – Nanne Sep 01 '11 at 06:57
  • That shouldn't matter, but i'll try it. The order should be ignored by MYSQL as the optimizer decides what to do, and the redundant '1' is there just for visual clearity and because you can put `"AND 'conditions'"` rows there without having difference between the first (not using an 'and') and the second row. I'll let you know In a bit – Nanne Sep 01 '11 at 07:09
  • Yes, I agree , order is taken care of by MySQL, but I was not sure about redundant true condition – DhruvPathak Sep 01 '11 at 07:11
  • Still same behaviour. No use of the index and slow, with `FORCE` it's quick – Nanne Sep 01 '11 at 07:12
1

based on your query format, the ideal index should be on

ip, date

or

ip, date, anotherstring <-- this could be overkill

and

order by null <-- eliminate the file sort

lastly, it could be your another database contains far lesser record

ajreal
  • 46,720
  • 11
  • 89
  • 119
  • You're probably correct about the index, but at the moment I don't want to many indexes on this table due to size (i'll have to test some with the effects, but the bigger it all gets, the more trouble you get with backups and all. might be minor, but still need to test some stuff about that first). Date is used for more operations, so there should be an index on that field alone regardless. And It would be enough if that is used in this query, so the question stays: why isn't it used? The other database is a bit bigger, about 9 milj. rows I think. – Nanne Sep 01 '11 at 07:22
1

The index is not used because the execution planner decides that it's best to full scan the table than use the index. This happens when the index is not selective enough for a query.

If the dates in your range check are more than 10-20% of the whole table, then the planner decides that scanning (sequentially) the whole table will be faster than using the index and retrieving the rows that fall in that range (this retrieval will not be sequential as the rows will be scattered all over the table).

That's why you see different behaviour with different sets of data.


For your query to work best, you can create index on:

(ip, yourDateField)

or

(anotherstring, yourDateField)

or

(ip, anotherstring, yourDateField)

I think the first option will be selective enough. No need to add the long VARCHAR(255) field in an index. Alternatively, use the FORCE INDEX that seems to work fine in your case.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • I did try to add that first index in test after @ajreal also mentioned that, but it also skips that index. Would that still be because it is not selective enough? I don't know if the dates are more then 10-20%. Do you mean the range or the number of rows in that range? The number of rows might be more, the range isn't (that's because the number of rows added per hour has been increasing). I'm still unsure why it is happening in this case, but pasting all my data would be tricky (not to mention illegal). I might have to stay with the force. – Nanne Sep 01 '11 at 07:53