0

Right now, I've a SQL query such as:

      SELECT
        SUM(dtraff_web) AS dtraff_web,
        SUM(dtraff_ftp) AS dtraff_ftp,
        SUM(dtraff_mail) AS dtraff_mail,
        SUM(dtraff_pop) AS dtraff_pop
      FROM domain_traffic WHERE `dtraff_time` BETWEEN
        UNIX_TIMESTAMP(((LAST_DAY(CURDATE()) + INTERVAL 1 DAY) - INTERVAL 1 MONTH ))
        AND
        UNIX_TIMESTAMP((LAST_DAY(CURDATE()) + INTERVAL 1 DAY ))
      AND
        domain_id = ?

I've added an index which covers all fields (covering index) such as:

ALTER TABLE `domain_traffic` ADD INDEX `full_covering` (`domain_id`, `dtraff_time`, `dtraff_web`, `dtraff_ftp`, `dtraff_mail`, `dtraff_pop`) USING BTREE;

to avoid full scan but when explaining the query I get: 'Using where; Using index'

I would avoid 'Using where'. It is possible when range is involved?

Current `explain' result is:

mysql>           EXPLAIN SELECT SQL_NO_CACHE
    ->             SUM(dtraff_web) AS dtraff_web,
    ->             SUM(dtraff_ftp) AS dtraff_ftp,
    ->             SUM(dtraff_mail) AS dtraff_mail,
    ->             SUM(dtraff_pop) AS dtraff_pop
    ->           FROM domain_traffic WHERE `dtraff_time` BETWEEN
    ->             UNIX_TIMESTAMP(((LAST_DAY(CURDATE()) + INTERVAL 1 DAY) - INTERVAL 1 MONTH ))
    ->             AND
    ->             UNIX_TIMESTAMP((LAST_DAY(CURDATE()) + INTERVAL 1 DAY ))
    ->           AND
    ->             domain_id = 1;
+----+-------------+----------------+------------+-------+----------------------------------+---------------+---------+------+------+----------+--------------------------+
| id | select_type | table          | partitions | type  | possible_keys                    | key           | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+----------------+------------+-------+----------------------------------+---------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | domain_traffic | NULL       | range | i_unique_timestamp,full_covering | full_covering | 12      | NULL |  959 |   100.00 | Using where; Using index |
+----+-------------+----------------+------------+-------+----------------------------------+---------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0,00 sec)

mysql> 

Table schema is:

mysql> show create table domain_traffic;
+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table          | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| domain_traffic | CREATE TABLE `domain_traffic` (
  `dtraff_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `domain_id` int(10) unsigned NOT NULL,
  `dtraff_time` bigint(20) unsigned NOT NULL,
  `dtraff_web` bigint(20) unsigned DEFAULT '0',
  `dtraff_ftp` bigint(20) unsigned DEFAULT '0',
  `dtraff_mail` bigint(20) unsigned DEFAULT '0',
  `dtraff_pop` bigint(20) unsigned DEFAULT '0',
  PRIMARY KEY (`dtraff_id`),
  UNIQUE KEY `i_unique_timestamp` (`domain_id`,`dtraff_time`),
  KEY `full_covering` (`domain_id`,`dtraff_time`,`dtraff_web`,`dtraff_ftp`,`dtraff_mail`,`dtraff_pop`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=5452211 DEFAULT CHARSET=latin1 |
+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)

mysql>
  • what is the table schema, what is the explain output. – Drew Jul 21 '16 at 17:37
  • @drew InnoDB; expected explain in extra field would be 'Using index' – Laurent DECLERCQ a.k.a Nuxwin Jul 21 '16 at 17:42
  • By the way, as long as `type` is range and not ALL, you are in [good hands](http://stackoverflow.com/a/38002986). You don't need to go overboard with covered indexes as you know they will have deterimental effects for upserts. A partial index use is often stellar enough. – Drew Jul 21 '16 at 17:42
  • Innodb is the engine type. A schema would mean the output of `show create table xyz` for each relevant table (xyz) in your question. – Drew Jul 21 '16 at 17:42
  • @drew Added explain result as you requested. Can you give me a sample for partial index so far? Thanks ;) – Laurent DECLERCQ a.k.a Nuxwin Jul 21 '16 at 17:44
  • 1
    I will take a look. As for covered versus partial-use, check out [this](http://stackoverflow.com/a/38206411). Often just get a ways left-most (say, 2 left-most columns) is zippy enough. And a decent balance. – Drew Jul 21 '16 at 17:47
  • @Drew Added table schema as requested. Thank you for your help. – Laurent DECLERCQ a.k.a Nuxwin Jul 21 '16 at 17:48

1 Answers1

1

I don't put a ton of stock in Explain. It's a clue, but it is still voodoo. The same explain output from two similar queries can return the same Explain output, yet different performance (reference forthcoming). Which means the Optimizer rules.

The insert routine below is used for anyone that wants to take it from here. Without adding adequate size to a test table, the index is not used for queries. The manual states that of course, just saying. Because some people think indexes are always used if they seem like they should be (ie: a guy's test table with 10 rows won't be using indexes).

Below is some chicken scratch. Hopefully it is helpful.

create schema nuxwin099a;
use nuxwin099a;

CREATE TABLE `domain_traffic` (
  `dtraff_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `domain_id` int(10) unsigned NOT NULL,
  `dtraff_time` bigint(20) unsigned NOT NULL,
  `dtraff_web` bigint(20) unsigned DEFAULT '0',
  `dtraff_ftp` bigint(20) unsigned DEFAULT '0',
  `dtraff_mail` bigint(20) unsigned DEFAULT '0',
  `dtraff_pop` bigint(20) unsigned DEFAULT '0',
  PRIMARY KEY (`dtraff_id`),
  UNIQUE KEY `i_unique_timestamp` (`domain_id`,`dtraff_time`),
  KEY `full_covering` (`domain_id`,`dtraff_time`,`dtraff_web`,`dtraff_ftp`,`dtraff_mail`,`dtraff_pop`) USING BTREE
) ENGINE=InnoDB;

Stored proc to insert random rows:

drop procedure if exists insertMany;
DELIMITER $$
create procedure insertMany
(
    howMany int
)
BEGIN
    DECLARE soFar int default 0;
    WHILE soFar<howMany DO
        -- insert ignore is used because of `i_unique_timestamp`
        insert ignore domain_traffic (domain_id,dtraff_time,dtraff_web,dtraff_ftp,dtraff_mail,dtraff_pop) values
        (1,rand()*2000000+unix_timestamp(now())-3000000,1,2,3,4);

        set soFar=soFar+1;
        if soFar>500 then
            set soFar=howMany; -- hah, you can't trick me. I am not doing that many. RAND() is slow.
        end if;
    END WHILE;
END$$
DELIMITER ; 

Test Stored Proc:

call insertMany(1000);
-- select count(*) from domain_traffic;

.

select from_unixtime(dtraff_time) from domain_traffic;
select 60*60*24*30;
--  2.6M sec/month

1 469 124 890
1 469 125 020
1 469 125 042
select unix_timestamp(now()); -- seconds since epoch
select current_timestamp();

Test some explain output (3 conditions):

Stub A:
WHERE `dtraff_time` BETWEEN 1 and 2 

Stub B:
WHERE `dtraff_time` BETWEEN 1400000000 and 1500000000 

Stub C:
WHERE `dtraff_time` BETWEEN UNIX_TIMESTAMP(((LAST_DAY(CURDATE()) + INTERVAL 1 DAY) - INTERVAL 1 MONTH )) and UNIX_TIMESTAMP((LAST_DAY(CURDATE()) + INTERVAL 1 DAY ))

EXPLAIN SELECT  
SUM(dtraff_web) AS dtraff_web, 
SUM(dtraff_ftp) AS dtraff_ftp, 
SUM(dtraff_mail) AS dtraff_mail, 
SUM(dtraff_pop) AS dtraff_pop 
FROM domain_traffic 
WHERE (fill in stubs A to C above, individually)  
AND domain_id = 1; 

Explain results, Stub A:
+----+-------------+----------------+-------+----------------------------------+--------------------+---------+------+------+-----------------------+
| id | select_type | table          | type  | possible_keys                    | key                | key_len | ref  | rows | Extra                 |
+----+-------------+----------------+-------+----------------------------------+--------------------+---------+------+------+-----------------------+
|  1 | SIMPLE      | domain_traffic | range | i_unique_timestamp,full_covering | i_unique_timestamp | 12      | NULL |    1 | Using index condition |
+----+-------------+----------------+-------+----------------------------------+--------------------+---------+------+------+-----------------------+

Explain results, Stub B:
+----+-------------+----------------+-------+----------------------------------+---------------+---------+------+------+--------------------------+
| id | select_type | table          | type  | possible_keys                    | key           | key_len | ref  | rows | Extra                    |
+----+-------------+----------------+-------+----------------------------------+---------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | domain_traffic | range | i_unique_timestamp,full_covering | full_covering | 12      | NULL | 1284 | Using where; Using index |
+----+-------------+----------------+-------+----------------------------------+---------------+---------+------+------+--------------------------+

Explain results, Stub C:
+----+-------------+----------------+-------+----------------------------------+---------------+---------+------+------+--------------------------+
| id | select_type | table          | type  | possible_keys                    | key           | key_len | ref  | rows | Extra                    |
+----+-------------+----------------+-------+----------------------------------+---------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | domain_traffic | range | i_unique_timestamp,full_covering | full_covering | 12      | NULL |  515 | Using where; Using index |
+----+-------------+----------------+-------+----------------------------------+---------------+---------+------+------+--------------------------+

Cleanup:

drop schema nuxwin099a;

What matters to me most is performance, versus fixating over the exact Explain output.

Drew
  • 24,851
  • 10
  • 43
  • 78
  • I don't see any improvement in your answer. You use the same schema. ICP optimization should do the job for my use case with the i_unique_timestamp index but we must be also MySQL 5.5 compatible... So far, it seem that a covering index is really needed, isn't it? – Laurent DECLERCQ a.k.a Nuxwin Jul 21 '16 at 22:49
  • I wasn't changing anything at all. I was documenting Explain output. You gave no data to work with nor any explain stats X vs Y. – Drew Jul 21 '16 at 22:51
  • You said that from your point of view, a covering index is not really necessary and that usage of a partial index using left-most columns should be sufficient. The question here is: Which left-most columns? From 'select' part or 'where' part? Sorry for the misunderstanding ;) – Laurent DECLERCQ a.k.a Nuxwin Jul 21 '16 at 22:54
  • To me you have it covered. There is not a column in the select list + `where` that is not in your composite. I said one should not get worked up over explain output. Meaning, maybe you have achieved nirvana with the thing. But you have not shown any timings for various indexes in place. Perhaps look into `analyze table` – Drew Jul 21 '16 at 22:56
  • You have a whole system there with other queries and other requirements. Random people on the stack saying this should be your index strategy without knowing your other needs ---> irresponsible. – Drew Jul 21 '16 at 22:57
  • I'll provide stats ;) BTW: I known that I'm covering all fields already with the 'full_convering' index. Question was: Why there is still usage of 'where' and not only 'index' with that covering index. It seem to me here that when type is 'range', there is no way to get result only from index. – Laurent DECLERCQ a.k.a Nuxwin Jul 21 '16 at 22:58
  • It seem that the i_unique_timestamps index is sufficicent after all. With 4 million rows, I get similar time result using that index or a covering index (tested with MySQL 5.5). – Laurent DECLERCQ a.k.a Nuxwin Jul 22 '16 at 00:25
  • What were the time differences in the two strategies? – Drew Jul 22 '16 at 00:32
  • 0.0029 with i_unique_timestamp index -> i_unique_timestamp (domain_id,dtraff_time) ; 0.0011 with i_covering_index -> i_covering_index (domain_id,dtraff_time,dtraff_web,dtraff_ftp,dtraff_mail,dtraff_pop) There is a little difference but I don't think that is really relevant, isn't it? – Laurent DECLERCQ a.k.a Nuxwin Jul 22 '16 at 00:43
  • 1
    sounds like `i_unique_timestamp` is more than sufficient (and ditch the wider 6 column covered) as it would only bog down everything else based on what I am hearing. – Drew Jul 22 '16 at 00:49
  • Yep, you're right. I learned one thing today: The covering indexes are pretty useless with range queries... They are more relevant when there is only constants involved, right? – Laurent DECLERCQ a.k.a Nuxwin Jul 22 '16 at 00:54
  • 1
    It would appear it represents the *filtering* aspect of index then "where from there" (the filter then mini-search from there) has to be represented somehow in a simplistic way in the Extra column. Getting great clarity on it is tough and guesswork. So we test. I am trying to get some better response to you better tied up with other stuff right now. – Drew Jul 22 '16 at 01:00