0

I have two tables that are all the same, except one has a timestamp value column and the other has a datetime value column. Indexes are the same. Values are the same.

But when I run SELECT station, MAX(timestamp) AS max_timestamp FROM stations GROUP BY station; if stations is the one with timestamps, it executes really fast, and if I try it with the datetime one, well I haven't seen one query executes. In both cases the timestampcolumn is indexed, only the type changes.

Where should I start looking for? Or is datetime just not suitable for search and indexing ?

Here is what EXPLAIN gives :

+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
|  1 | SIMPLE      | stations | range | NULL          | stamp   | 33      | NULL | 1511 | Using index for group-by |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+

+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows    | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
|  1 | SIMPLE      |stations2 | index | NULL          | station | 2       | NULL | 3025467 |       |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+

And the SHOW:

+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stations | CREATE TABLE `stations` (
  `station` varchar(10) COLLATE utf8_bin DEFAULT NULL,
  `available` smallint(6) DEFAULT NULL,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  UNIQUE KEY `stamp` (`station`,`timestamp`),
  KEY `time` (`timestamp`),
  KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stations2 | CREATE TABLE `stations2` (
  `station` smallint(5) unsigned NOT NULL,
  `available` smallint(5) unsigned DEFAULT NULL,
  `timestamp` datetime DEFAULT NULL,
  KEY `station` (`station`),
  KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
Cystack
  • 3,301
  • 5
  • 35
  • 33

1 Answers1

1

You can see from the EXPLAIN that there is no key being used for selection (NULL for possible_keys). You don't have a WHERE clause, so this makes sense.

MySQL can utilize an index to determine MAX, and it can utilize an index to optimize GROUP BY. However, to be able to optimize both combined, you would need both the column in your MAX() function and the column in your GROUP BY clause to be in a compound index. In the first table, you have this compound index as a unique key called 'stamp'. The EXPLAIN result shows that MySQL is using that index.

On the second table, you don't have this compound index, so MySQL is having to perform a lot more work. It has to manually group the results and keep the MAX value for each station by manually scanning each row. If you add the same compound index on the second table, you will see similar performance between the two.

However, TIMESTAMP will still slightly outperform DATETIME because TIMESTAMP is treated as a single 4 byte integer value, which is processed faster than an 8 byte special DATETIME value. The larger the data set, the larger difference you will see.

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • Thank you 1000 times. So 2 questions : if I had a compound station-timestamp index I should get the same results ? & since I'm going to start over, should I use INT (with UNIX) instead of DATETIME ? For efficiency purpose... – Cystack May 25 '12 at 17:02
  • @Cystack, Yes, if you add the index, the EXPLAIN result should be similar, and you should have similar performance. Using TIMESTAMP will be faster (only 4 bytes) than INT (8 bytes), which will be faster than DATETIME (8 bytes but with special processing), however, the difference isn't that great. You might just pick the easiest to implement, but if everything else is equal, pick TIMESTAMP. – Marcus Adams May 25 '12 at 17:05
  • damn... the migration is done and timestamp is actually close to 10 times faster :( – Cystack May 26 '12 at 10:00