0

This is my first attempt at reaching out for help on StackOverflow :)

I have the following table:

CREATE TABLE `tinfinite_visits` (
  `visit_id` int(255) NOT NULL AUTO_INCREMENT,
  `identity_id` int(255) NOT NULL,
  `ip` varchar(39) NOT NULL,
  `loggedin` enum('0','1') NOT NULL DEFAULT '0',
  `url` longtext NOT NULL,
  `realurl` longtext NOT NULL,
  `referrer` longtext NOT NULL,
  `method` enum('GET','POST','HEAD','OPTIONS','PUT','DELETE','TRACE','CONNECT','PATCH') NOT NULL,
  `client` longtext NOT NULL,
  `referring` longtext NOT NULL,
  `timestart` datetime NOT NULL,
  `timeend` datetime NOT NULL,
  PRIMARY KEY (`visit_id`),
  KEY `timestart` (`timestart`),
  KEY `identity_id` (`identity_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

At some point, I need to get some data from this table to generate a line chart. I currently do it using 5 different queries, for 5 different time intervals (total, year, month, week, day).

The table currently has around 200.000 rows, but will got a lot bigger (even tens of millions records).

While my queries work perfectly for the purpose, I'm trying to find a better way performance-wise.

So I would enormously appreciate any tips/advice on how can I improve the queries' performance, preferably even merging all 5 queries into 1, if possible.

The queries I'm currently using, their EXPLAINs, as well as their execution times (for around 200.000 rows) are as follows:

Day query:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(CONVERT_TZ(timestart, '-5:00', '+3:00'), '%l%p') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
  GROUP BY unit
  ORDER BY `timestart` ASC

Explain:

+----+-------------+----------------------+-------+---------------+-----------+---------+-----+-------+------------------------------+
| id | select_type | table                | type  | possible_keys | key       | key_len | ref | rows  | Extra                        |
+----+-------------+----------------------+-------+---------------+-----------+---------+-----+-------+------------------------------+
| 1  | SIMPLE      | tinfinite_visits     | range | timestart     | timestart | 8       |     | 11113 | Using where; Using temporary |
+----+-------------+----------------------+-------+---------------+-----------+---------+-----+-------+------------------------------+

Time: 0.011280059814453

Week query:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(CONVERT_TZ(timestart, '-5:00', '+3:00'), '%a') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 7 DAY)
  GROUP BY unit
  ORDER BY `timestart` ASC

Explain:

+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| id | select_type | table            | type | possible_keys | key | key_len | ref | rows   | Extra                                        |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| 1  | SIMPLE      | tinfinite_visits | ALL  | timestart     |     |         |     | 205897 | Using where; Using temporary; Using filesort |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+

Time: 0.13543295860291

Month query:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(CONVERT_TZ(timestart, '-5:00', '+3:00'), '%d') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 28 DAY)
  GROUP BY unit
  ORDER BY `timestart` ASC

Explain:

+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| id | select_type | table            | type | possible_keys | key | key_len | ref | rows   | Extra                                        |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| 1  | SIMPLE      | tinfinite_visits | ALL  | timestart     |     |         |     | 205897 | Using where; Using temporary; Using filesort |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+

Time: 0.21460795402527

Year query:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(`timestart`, '%b') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 1 YEAR)
  GROUP BY unit
  ORDER BY `timestart` ASC

Explain:

+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| id | select_type | table            | type | possible_keys | key | key_len | ref | rows   | Extra                                        |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| 1  | SIMPLE      | tinfinite_visits | ALL  | timestart     |     |         |     | 205897 | Using where; Using temporary; Using filesort |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+

Time: 0.50977802276611

Overall query:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(`timestart`, '%b') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 100 YEAR)
  GROUP BY unit
  ORDER BY `timestart` ASC

Explain:

+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| id | select_type | table            | type | possible_keys | key | key_len | ref | rows   | Extra                                        |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| 1  | SIMPLE      | tinfinite_visits | ALL  | timestart     |     |         |     | 205897 | Using where; Using temporary; Using filesort |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+

Time: 0.52196192741394

Thank you very, very much!

  • Premature optimization will always cause troubles. How many records you think there will be? What shows `EXPLAIN` now? – Alma Do Oct 01 '13 at 10:31
  • 1
    The explain will change with time. Right now it looks optimal. It uses where to filter, then stores grouped values, then orders those values. – AdrianBR Oct 01 '13 at 11:30

1 Answers1

1

Salut Emilian, Explain sometimes changes with the number of rows.

However, since you are using calculated column in the grouping and where, you can choose to implement a date dimension table.

you can find date dimension table code for almost any DB:

google.com/search?q=date+dimension+table

Why? See below Time and date dimension in data warehouse

Community
  • 1
  • 1
AdrianBR
  • 2,762
  • 1
  • 15
  • 29