3

I need to optimize how I analyze a rather large data set, and I'm unsure of what the next steps are. I have done a fair bit of MySQL configuration tuning.

I have this InnoDB table:

+----------------+--------------+------+-----+---------+----------------+
| Field          | Type         | Null | Key | Default | Extra          |
+----------------+--------------+------+-----+---------+----------------+
| id             | int(250)     | NO   | PRI | NULL    | auto_increment |
| memory         | int(15)      | YES  | MUL | NULL    |                |
| q              | varchar(250) | YES  | MUL | NULL    |                |
| created        | datetime     | YES  |     | NULL    |                |
| modified       | datetime     | YES  |     | NULL    |                |
| dt             | datetime     | YES  | MUL | NULL    |                |
| site_id        | int(250)     | NO   | MUL | NULL    |                |
| execution_time | int(11)      | YES  | MUL | NULL    |                |
+----------------+--------------+------+-----+---------+----------------+

Here is a sample of 10 rows:

+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| id        | memory   | q               | created             | modified            | dt                  | site_id | execution_time |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| 266864867 | 38011080 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:44 |     890 |           1534 |
| 266864868 | 46090184 | node/16432      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:46 |     890 |            840 |
| 266864869 | 50329248 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:05:16 |     890 |           2500 |
| 266864870 | 38011272 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:01 |     890 |           1494 |
| 266864871 | 46087732 | node/16432      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:03 |     890 |            850 |
| 266864872 | 30304428 | node/303        | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:12 |     890 |            113 |
| 266864873 | 50329412 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 |     890 |           2465 |
| 266864874 | 28253112 | front_page      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 |     890 |             86 |
| 266864875 | 28256044 | front_page      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:32 |     890 |             81 |
| 266864876 | 38021072 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:55 |     890 |           1458 |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+

Here are the table indexes:

+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table    | Non_unique | Key_name             | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| memories |          0 | PRIMARY              |            1 | id             | A         |     8473766 |     NULL | NULL   |      | BTREE      |         |
| memories |          1 | index_dt             |            1 | dt             | A         |     1210538 |     NULL | NULL   | YES  | BTREE      |         |
| memories |          1 | index_execution_time |            1 | execution_time | A         |        2344 |     NULL | NULL   | YES  | BTREE      |         |
| memories |          1 | index_memory         |            1 | memory         | A         |     8473766 |     NULL | NULL   | YES  | BTREE      |         |
| memories |          1 | index_site_id        |            1 | site_id        | A         |          16 |     NULL | NULL   |      | BTREE      |         |
| memories |          1 | index_q              |            1 | q              | A         |      338950 |     NULL | NULL   | YES  | BTREE      |         |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+

It stores over a million records for many different sites (site_id). For a given site, there might be 20,000 rows. The information stored is performance metrics for individual page requests. If it matters, non-obvious fields: The memory field is how much memory the script used, q is the path, site_id is a reference to a table Sites.

I have two slow queries that I run on this data. The first gets the 25 most most memory-hog pages:

Select 
  Memory.q, count(*) as count, 
  AVG(Memory.memory) as average_memory, 
  MAX(Memory.memory) as peak_memory,
  AVG(Memory.execution_time) as average_execution_time,
  MAX(Memory.execution_time) as peak_execution_time 
FROM Memory 
WHERE site_id = $some_site_id 
ORDER BY average_memory DESC 
GROUP BY Memory.q
LIMIT 25

The second query gets the the slowest average 25 pages for a given site:

Select 
  Memory.q, count(*) as count, 
  AVG(Memory.memory) as average_memory, 
  MAX(Memory.memory) as peak_memory,
  AVG(Memory.execution_time) as average_execution_time,
  MAX(Memory.execution_time) as peak_execution_time 
FROM Memory 
WHERE site_id = $some_site_id 
ORDER BY average_execution_time DESC 
GROUP BY Memory.q
LIMIT 25

I recently converted the table from MyISAM to InnoDB, so that these reads would not lock the table. This was causing operations that update this table to queue up and lag.

Beyond throwing more ram at the problem (to increase the InnoDB cache size), I want to see if there are other options. I've never worked with a NoSQL database, but from what I understand they won't be of much help here because I use aggregate functions and queries.

The app is written in PHP, if it matters.

Any ideas for a better way to approach the storage and analysis of this data?

Update:

Profiling the query shows the slowness is all in the copying to temp table. I will research how to make this step faster.

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000030 |
| checking query cache for query | 0.000065 |
| Opening tables                 | 0.000013 |
| System lock                    | 0.000004 |
| Table lock                     | 0.000014 |
| init                           | 0.000032 |
| optimizing                     | 0.000010 |
| statistics                     | 0.008119 |
| preparing                      | 0.000042 |
| Creating tmp table             | 0.000317 |
| executing                      | 0.000005 |
| Copying to tmp table           | 5.349280 |
| Sorting result                 | 0.006511 |
| Sending data                   | 0.000092 |
| end                            | 0.000005 |
| removing tmp table             | 0.001510 |
| end                            | 0.000007 |
| query end                      | 0.000004 |
| freeing items                  | 0.001163 |
| logging slow query             | 0.000006 |
| cleaning up                    | 0.000006 |
+--------------------------------+----------+
21 rows in set (0.01 sec)

mysql> show profile cpu for query 4;
+--------------------------------+----------+----------+------------+
| Status                         | Duration | CPU_user | CPU_system |
+--------------------------------+----------+----------+------------+
| starting                       | 0.000030 | 0.000000 |   0.000000 |
| checking query cache for query | 0.000065 | 0.000000 |   0.000000 |
| Opening tables                 | 0.000013 | 0.000000 |   0.000000 |
| System lock                    | 0.000004 | 0.000000 |   0.000000 |
| Table lock                     | 0.000014 | 0.000000 |   0.000000 |
| init                           | 0.000032 | 0.000000 |   0.000000 |
| optimizing                     | 0.000010 | 0.000000 |   0.000000 |
| statistics                     | 0.008119 | 0.001000 |   0.000000 |
| preparing                      | 0.000042 | 0.000000 |   0.000000 |
| Creating tmp table             | 0.000317 | 0.000000 |   0.000000 |
| executing                      | 0.000005 | 0.000000 |   0.000000 |
| Copying to tmp table           | 5.349280 | 0.687896 |   0.412937 |
| Sorting result                 | 0.006511 | 0.004999 |   0.001999 |
| Sending data                   | 0.000092 | 0.000000 |   0.000000 |
| end                            | 0.000005 | 0.000000 |   0.000000 |
| removing tmp table             | 0.001510 | 0.000000 |   0.001000 |
| end                            | 0.000007 | 0.000000 |   0.000000 |
| query end                      | 0.000004 | 0.000000 |   0.000000 |
| freeing items                  | 0.001163 | 0.000000 |   0.001000 |
| logging slow query             | 0.000006 | 0.000000 |   0.000000 |
| cleaning up                    | 0.000006 | 0.000000 |   0.000000 |
+--------------------------------+----------+----------+------------+
Justin
  • 2,914
  • 5
  • 41
  • 66
  • Does the calculation need to be done in real-time? Could you just periodically dump the data and then process it off-line? – robbrit Dec 13 '11 at 19:53
  • Thanks for asking. This data does not need to be run in real-time. I already have an hourly process that does this calculation on 1/24 of the sites and inserts the results into a reporting table that is what is read in real time in the app. However, this rolling process is bogging the system down. – Justin Dec 13 '11 at 19:55
  • What you can do with MySQL is have a "slave" database which is a separate database from the main one but clones the data. Then you can run your queries on the slave database without affecting the main one. – robbrit Dec 13 '11 at 20:05
  • @robbrit - Thanks, I have set up slaves before. I was hoping there was a way to avoid throwing more hardware at the problem. – Justin Dec 15 '11 at 16:03
  • At this point, I think I might just need more ram, judging by the mysql tuner output. – Justin Dec 15 '11 at 16:11

7 Answers7

4

You don't show your key structure, though it does show that site_id is part of a multi-part key (MUL). note that if it's not the FIRST field in that multi-part key, then the key cannot be used for that where clause. e.g if you have

KEY somekey (field1, site_id, field3, ...)

then your where clause must include BOTH field and site_id for that key to be usable in the query. You don't have to use the fields in the same order they're listed in the key (where site_id=.. and field1=... will work the same as where field1=... and site_id=...), but since field1 appears before site_id in the key's definition, you must use it as well for the entire key to be usable.

The same holds for your q field. It must also be first in the keys being covered, or those keys are unusable.

Marc B
  • 356,200
  • 43
  • 426
  • 500
  • Thanks. Can you explain why this answer is relevant? I mean, the queries work fine, they are just slow. Maybe I'm just not clear. – Justin Dec 13 '11 at 19:56
  • One reason for slowness is the absence of keys, or misconfigured keys. This becomes very relevant on large datasets, when mysql has to scan the entire table (possibly repeatedly) to do its work, whereas having an index available will cut the time required by huge chunks. – Marc B Dec 13 '11 at 20:28
  • Thanks! The table does have indexes, though I'm not an expert on indexes so they might be wrong. I added them to the question. Do they look right? – Justin Dec 15 '11 at 16:04
3

To effectively design innodb tables you need to understand how innodb uses indexes - in particular what clustered indexes are and how they work.

Background reading

Please take some time to read the following articles and previous answers of mine:

You may also find this presentation of interest:

So now you have a better understanding of the innodb architecture we'll take a look at how to optimise your model for the innodb engine.

As you've only provided two example queries I've had to make certain assumptions so the following design is optimised for queries that cover site_id and path. I'll leave it to you to modify the design further (if required) as you know your data better than I do.

Revised Schema (Simplified)

I've modified your design and created 3 tables: site, site_request and site_request_metric.

Site Table (1024 rows)

drop table if exists site;
create table site
(
site_id smallint unsigned not null auto_increment primary key,
url varchar(255) unique not null,
next_request_id int unsigned not null default 0
)
engine=innodb;

select count(*) from site;
+----------+
| count(*) |
+----------+
|     1024 |
+----------+

Site Table - Sample Data

+---------+------------------+-----------------+
| site_id | url              | next_request_id |
+---------+------------------+-----------------+
|       1 | www.site1.com    |             167 |
|       2 | www.site2.com    |             177 |
|       3 | www.site3.com    |              68 |
...
|    1022 | www.site1022.com |             203 |
|    1023 | www.site1023.com |              80 |
|    1024 | www.site1024.com |             239 |
+---------+------------------+-----------------+

Most of the above fields are self explanatory however the next_request_id is a counter field which records how many requests (path or q in your example) a given site has. For example site 1024 has 239 individual page requests/paths against which we want to record memory and execution metrics.

Also pay attention to the numeric data types I have used - most of yours are poorly defined as you seem to confuse the optional display width specifier (only used with zerofill) with size of integer. It is important to choose the smallest data type possible so we can pack more data in our innodb buffer.

http://dev.mysql.com/doc/refman/5.0/en/integer-types.html

Site Request Table (192K rows)

drop table if exists site_request;
create table site_request
(
site_id smallint unsigned not null,
request_id int unsigned not null,
created_date datetime not null,
path varchar(255) not null,
next_metric_id int unsigned not null default 0,
primary key (site_id, request_id)
)
engine=innodb;

select count(*) from site_request;
+----------+
| count(*) |
+----------+
|   192336 |
+----------+

Site Request Table - Sample Data

+---------+------------+---------------------+----------------------+----------------+
| site_id | request_id | created_date        | path                 | next_metric_id |
+---------+------------+---------------------+----------------------+----------------+
|       1 |          1 | 2011-12-14 17:17:41 | www.site1.com/1      |            250 |
|       1 |          2 | 2011-12-14 17:17:41 | www.site1.com/2      |            132 |
|       1 |          3 | 2011-12-14 17:17:41 | www.site1.com/3      |            345 |
...
|       1 |         166| 2011-12-14 17:17:41 | www.site1.com/166    |            342 |
|       1 |         167| 2011-12-14 17:17:41 | www.site1.com/167    |            231 |
...
|    1024 |          1 | 2011-12-14 17:17:58 | www.site1024.com/1   |            241 |
|    1024 |          2 | 2011-12-14 17:17:58 | www.site1024.com/2   |            266 |
...
|    1024 |        236 | 2011-12-14 17:17:58 | www.site1024.com/236 |            466 |
|    1024 |        237 | 2011-12-14 17:17:58 | www.site1024.com/237 |            459 |
|    1024 |        238 | 2011-12-14 17:17:58 | www.site1024.com/238 |            389 |
|    1024 |        239 | 2011-12-14 17:17:58 | www.site1024.com/239 |            592 |
+---------+------------+---------------------+----------------------+----------------+

Again most of the fields are self explanatory. The primary key of this table is a composite of site_id and request_id, so site 1 has 167 individual requests/paths and site 1024 has 239.

To select an individual request you have to specify both site_id and request_id:

select * from site_request where site_id = 1 and request_id = 167
+---------+------------+---------------------+-------------------+----------------+
| site_id | request_id | created_date        | path              | next_metric_id |
+---------+------------+---------------------+-------------------+----------------+
|       1 |        167 | 2011-12-14 17:17:41 | www.site1.com/167 |            231 |
+---------+------------+---------------------+-------------------+----------------+
1 row in set (0.00 sec)

select * from site_request where site_id = 1024 and request_id = 167
+---------+------------+---------------------+----------------------+----------------+
| site_id | request_id | created_date        | path                 | next_metric_id |
+---------+------------+---------------------+----------------------+----------------+
|    1024 |        167 | 2011-12-14 17:17:58 | www.site1024.com/167 |            175 |
+---------+------------+---------------------+----------------------+----------------+
1 row in set (0.00 sec)

If I want to add a new request to a site we use the site.next_request_id + 1 to generate the next composite primary key value for a given site_id. This is normally done using a trigger as follows:

delimiter #

create trigger site_request_before_ins_trig before insert on site_request
for each row
begin
declare v_id int unsigned default 0;

  select next_request_id + 1 into v_id from site where site_id = new.site_id;
  set new.request_id = v_id, new.created_date = now();
  update site set next_request_id = v_id where site_id = new.site_id;
end#

delimiter ;

Why didnt I just create an auto_increment primary key and a secondary index on site_id ?

create table site_request
(
request_id int unsigned not null auto_increment primary key,
site_id smallint unsigned not null,
...
key (site_id)
)
engine=innodb;

Well I've made an assumption that most of your queries cover site_id and path so clustering the requests table on site_id is a worthwhile optimisation even though insert overhead will be slightly increased. I'm more concerned about read performance especially as this table will be joined with the HUGE metrics table later on.

Site Request Metric Table (74 Million rows)

drop table if exists site_request_metric;
create table site_request_metric
(
site_id smallint unsigned not null,
request_id int unsigned not null,
metric_id int unsigned not null,
created_date datetime not null,
memory_usage int unsigned not null default 0,
execution_time mediumint unsigned not null default 0,
primary key (site_id, request_id, metric_id)
)
engine=innodb;

select count(*) from site_request_metric;
+----------+
| count(*) |
+----------+
| 73858764 |
+----------+

Site Request Metric Table - Sample Data

+---------+------------+-----------+---------------------+--------------+----------------+
| site_id | request_id | metric_id | created_date        | memory_usage | execution_time |
+---------+------------+-----------+---------------------+--------------+----------------+
|       1 |          1 |         1 | 2011-12-14 17:17:58 |     18052380 |       7731 |
|       1 |          1 |         2 | 2011-12-14 17:17:58 |     32013204 |       7881 |
|       1 |          1 |         3 | 2011-12-14 17:17:58 |     55779470 |       7274 |
...
|       1 |          1 |       249 | 2011-12-14 17:17:58 |     11527748 |       5126 |
|       1 |          1 |       248 | 2011-12-14 17:17:58 |     19457506 |       4097 |
|       1 |          1 |       247 | 2011-12-14 17:17:58 |     23129432 |       6202 |
...
|     997 |          1 |         1 | 2011-12-14 19:08:48 |     38584043 |       7156 |
|     997 |          1 |         2 | 2011-12-14 19:08:48 |     68884314 |       2185 |
|     997 |          1 |         3 | 2011-12-14 19:08:48 |     31545597 |        207 |
...
|     997 |          1 |       380 | 2011-12-14 19:08:49 |     39123978 |        166 |
|     997 |          1 |       381 | 2011-12-14 19:08:49 |     45114404 |       7310 |
|     997 |          1 |       382 | 2011-12-14 19:08:49 |     55057884 |        506 |    +---------+------------+-----------+---------------------+--------------+----------------+

The site_request_metric.next_metric_id field works in a similar way to the site.next_request_id counter field and is maintained using a trigger.

delimiter #

create trigger site_request_metric_before_ins_trig before insert on site_request_metric
for each row
begin
declare v_id int unsigned default 0;

  select next_metric_id + 1 into v_id from site_request where site_id = new.site_id and request_id = new.request_id;
  set new.metric_id = v_id, new.created_date = now();
  update site_request set next_metric_id = v_id where site_id = new.site_id and request_id = new.request_id;
end#

delimiter ;

Schema Performance

Taking site 997 as an example:

select * from site where site_id = 997;
+---------+-----------------+-----------------+
| site_id | url             | next_request_id |
+---------+-----------------+-----------------+
|     997 | www.site997.com |             319 |
+---------+-----------------+-----------------+
1 row in set (0.00 sec)

Site 997 has 319 individual page requests/paths.

select * from site_request where site_id = 997;
+---------+------------+---------------------+---------------------+----------------+
| site_id | request_id | created_date        | path                | next_metric_id |
+---------+------------+---------------------+---------------------+----------------+
|     997 |          1 | 2011-12-14 17:17:58 | www.site997.com/1   |            383 |
|     997 |          2 | 2011-12-14 17:17:58 | www.site997.com/2   |            262 |
|     997 |          3 | 2011-12-14 17:17:58 | www.site997.com/3   |            470 |
|     997 |          4 | 2011-12-14 17:17:58 | www.site997.com/4   |            247 |
...
|     997 |        316 | 2011-12-14 17:17:58 | www.site997.com/316 |            176 |
|     997 |        317 | 2011-12-14 17:17:58 | www.site997.com/317 |            441 |
|     997 |        318 | 2011-12-14 17:17:58 | www.site997.com/318 |            419 |
|     997 |        319 | 2011-12-14 17:17:58 | www.site997.com/319 |            601 |
+---------+------------+---------------------+---------------------+----------------+
319 rows in set (0.00 sec)

How many metrics do we have for all of site 997's requests ?

select sum(next_metric_id) from site_request where site_id = 997;
+---------------------+
| sum(next_metric_id) |
+---------------------+
|              130163 |
+---------------------+
1 row in set (0.00 sec)

Summing the next_metric_id (as above) for this site is faster than the usual:

select count(*) from site_request_metric where site_id = 997;
+----------+
| count(*) |
+----------+
|   130163 |
+----------+
1 row in set (0.03 sec)

So site 997 has approx 130K memory and execution time metrics to analyse within a table of approx. 74 million rows.

Let's try our main queries next: (all run-times are cold i.e. mysql restarted, empty buffers and no query cache !)

Memory

select
 hog.*,
 sr.path
from
(
select 
 srm.site_id,
 srm.request_id,
 count(*) as counter, 
 avg(srm.memory_usage) as average_memory, 
 max(srm.memory_usage) as peak_memory,
 avg(srm.execution_time) as average_execution_time,
 max(srm.execution_time) as peak_execution_time 
from
 site_request_metric srm
where
 srm.site_id = 997
group by 
 srm.site_id,
 srm.request_id
order by
 average_memory desc
limit 25
) hog
inner join site_request sr on hog.site_id = sr.site_id and hog.request_id = sr.request_id;

Results as follows:

+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
| site_id | request_id | counter | average_memory | peak_memory | average_execution_time | peak_execution_time | path                |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
|     997 |        103 |     184 |  43381803.4293 |    69682361 |              4378.1630 |                8069 | www.site997.com/103 |
|     997 |        151 |     158 |  42594703.1392 |    69329761 |              4422.8481 |                8080 | www.site997.com/151 |
|     997 |        192 |     509 |  42470135.3360 |    69927112 |              4083.1198 |                8098 | www.site997.com/192 |
|     997 |        248 |     161 |  42169276.5590 |    69995565 |              4118.1180 |                7949 | www.site997.com/248 |
|     997 |        221 |     162 |  42156708.4877 |    69233026 |              4151.1667 |                8022 | www.site997.com/221 |
|     997 |        136 |     154 |  42026979.3831 |    69897045 |              4060.5649 |                8098 | www.site997.com/136 |
|     997 |        239 |     424 |  41979697.9788 |    69381215 |              4463.0189 |                8087 | www.site997.com/239 |
|     997 |         77 |     338 |  41864013.0266 |    69991164 |              3942.4142 |                8067 | www.site997.com/77  |
|     997 |        283 |     249 |  41853642.9157 |    69945794 |              3915.7028 |                8034 | www.site997.com/283 |
|     997 |          5 |     228 |  41815274.7851 |    69825743 |              3898.4123 |                8078 | www.site997.com/5   |
|     997 |        216 |     319 |  41766464.5078 |    69777901 |              3899.0752 |                8091 | www.site997.com/216 |
|     997 |        131 |     170 |  41720890.5118 |    69892577 |              4074.2588 |                8097 | www.site997.com/131 |
|     997 |        160 |     385 |  41702556.6545 |    69868379 |              4060.2727 |                8093 | www.site997.com/160 |
|     997 |        245 |     200 |  41683505.3900 |    69668739 |              4052.7950 |                8095 | www.site997.com/245 |
|     997 |         70 |     429 |  41640396.0466 |    69988619 |              3995.3310 |                8099 | www.site997.com/70  |
|     997 |         98 |     485 |  41553544.7649 |    69957698 |              4048.1443 |                8096 | www.site997.com/98  |
|     997 |        153 |     301 |  41542909.4651 |    69754024 |              3884.7409 |                8028 | www.site997.com/153 |
|     997 |        226 |     429 |  41523530.3939 |    69691453 |              4097.7226 |                8096 | www.site997.com/226 |
|     997 |         31 |     478 |  41442100.4435 |    69802248 |              3999.3096 |                8098 | www.site997.com/31  |
|     997 |        171 |     222 |  41405805.8153 |    69433643 |              4364.4414 |                8087 | www.site997.com/171 |
|     997 |        150 |     336 |  41393538.5744 |    69746950 |              4264.5655 |                8077 | www.site997.com/150 |
|     997 |        167 |     526 |  41391595.5741 |    69633242 |              4206.1597 |                8096 | www.site997.com/167 |
|     997 |        182 |     593 |  41288151.5379 |    69992913 |              4351.6476 |                8099 | www.site997.com/182 |
|     997 |         14 |     555 |  41239680.5387 |    69976632 |              4054.6126 |                8084 | www.site997.com/14  |
|     997 |        297 |     410 |  41163572.3805 |    69874576 |              4001.0829 |                8039 | www.site997.com/297 |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
25 rows in set (0.41 sec)

Execution time

select
 hog.*,
 sr.path
from
(
select 
 srm.site_id,
 srm.request_id,
 count(*) as counter, 
 avg(srm.memory_usage) as average_memory, 
 max(srm.memory_usage) as peak_memory,
 avg(srm.execution_time) as average_execution_time,
 max(srm.execution_time) as peak_execution_time 
from
 site_request_metric srm
where
 srm.site_id = 997
group by 
 srm.site_id,
 srm.request_id
order by
 average_execution_time desc
limit 25
) hog
inner join site_request sr on hog.site_id = sr.site_id and hog.request_id = sr.request_id;

Results as follows:

+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
| site_id | request_id | counter | average_memory | peak_memory | average_execution_time | peak_execution_time | path                |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
|     997 |        213 |     159 |  37962517.1321 |    67120491 |              4497.9119 |                8055 | www.site997.com/213 |
|     997 |        239 |     424 |  41979697.9788 |    69381215 |              4463.0189 |                8087 | www.site997.com/239 |
|     997 |        151 |     158 |  42594703.1392 |    69329761 |              4422.8481 |                8080 | www.site997.com/151 |
|     997 |        289 |     382 |  39227749.9869 |    69715783 |              4402.8927 |                8093 | www.site997.com/289 |
|     997 |         69 |     473 |  40099817.4715 |    69798587 |              4380.6850 |                8092 | www.site997.com/69  |
|     997 |        103 |     184 |  43381803.4293 |    69682361 |              4378.1630 |                8069 | www.site997.com/103 |
|     997 |        183 |     236 |  40111564.1356 |    69853507 |              4376.4280 |                8032 | www.site997.com/183 |
|     997 |        171 |     222 |  41405805.8153 |    69433643 |              4364.4414 |                8087 | www.site997.com/171 |
|     997 |         58 |     212 |  39289163.9057 |    69861740 |              4355.8396 |                8087 | www.site997.com/58  |
|     997 |         71 |     388 |  39895200.6108 |    69801188 |              4353.9639 |                8086 | www.site997.com/71  |
|     997 |        182 |     593 |  41288151.5379 |    69992913 |              4351.6476 |                8099 | www.site997.com/182 |
|     997 |        195 |     305 |  39780792.6066 |    69824981 |              4343.0295 |                8081 | www.site997.com/195 |
|     997 |        318 |     419 |  39860696.4415 |    69958266 |              4323.6420 |                8071 | www.site997.com/318 |
|     997 |        303 |     318 |  39357663.3899 |    69850523 |              4322.4686 |                8097 | www.site997.com/303 |
|     997 |        198 |     306 |  38990104.1699 |    69851817 |              4320.0621 |                8088 | www.site997.com/198 |
|     997 |        286 |     227 |  39654671.5859 |    69871305 |              4307.8811 |                8055 | www.site997.com/286 |
|     997 |        105 |     611 |  39055749.5008 |    69813117 |              4296.0802 |                8090 | www.site997.com/105 |
|     997 |        298 |     388 |  40150371.2474 |    69985665 |              4286.9716 |                8095 | www.site997.com/298 |
|     997 |         84 |     517 |  39520438.9497 |    69990404 |              4283.3578 |                8098 | www.site997.com/84  |
|     997 |        106 |     448 |  41099495.4018 |    69902616 |              4282.6094 |                8082 | www.site997.com/106 |
|     997 |        237 |     431 |  39017341.3387 |    69623443 |              4277.4872 |                8071 | www.site997.com/237 |
|     997 |         55 |     381 |  39603109.8294 |    69750984 |              4269.1969 |                8095 | www.site997.com/55  |
|     997 |         34 |     438 |  40697744.4087 |    69843517 |              4266.3288 |                8047 | www.site997.com/34  |
|     997 |         38 |     433 |  40169799.8291 |    69898182 |              4266.1663 |                8088 | www.site997.com/38  |
|     997 |        150 |     336 |  41393538.5744 |    69746950 |              4264.5655 |                8077 | www.site997.com/150 |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
25 rows in set (0.30 sec)

So that's a sub 0.5 second cold runtime for both queries against a table that contains approx. 74 million rows (subsequent runtimes are approx. 0.06 seconds)

This answer isnt meant to be a definitive one as there any many other factors that could affect the table and index design which I have not considered. It should however provide you with some insight on how simple table/index designs can dramatically improve innodb query performance.

Hope this helps :)

Full script here: http://pastie.org/3022142

Community
  • 1
  • 1
Jon Black
  • 16,223
  • 5
  • 43
  • 42
2

I would start with profiling the query with the built in profiler

mysql> SET profiling = 1;
mysql> <your query>;
mysql> SHOW PROFILES;
mysql> SHOW PROFILE FOR QUERY <id of your query>;
mysql> SHOW PROFILE CPU FOR QUERY <id of your query>;

Note that profiling isn't for free so do it when the site can handle it, perhaps on a replica of the live system.

Andreas Wederbrand
  • 38,065
  • 11
  • 68
  • 78
0

I would add another field with MD5 hash of the 'q' and use the values of the field for grouping.

It is not a good idea to have an index on varchar(250) and group by the values of the field.

And you will need a composite index on (site_id, q_hash)

newtover
  • 31,286
  • 11
  • 84
  • 89
0

If I'm reading your question (and the comments) correctly, the problem is that these queries are bogging the system down.

Other answers point you in good directions for optimization (fix your indicies, use the profiler, etc).

An additional tactic would be to set up replication, and run these heavy-duty queries against the slave. The master will hum along, writing to the binlog, and the slave will catch up once the queries complete. This setup allows you to hammer the slave with long-running queries without affecting the write-performance of the master.

timdev
  • 61,857
  • 6
  • 82
  • 92
0

What you really need are two good indexes to support the queries you presented.

The indexes you have at present are not adequate because data will still be retrieved from the table in conjunction with whichever index the MySQL Query Optimizer will decide to pick.

@MarkB's answer is in theory what you want (+1 for @MarkB). You just need to make the index fit criteria things for any given query:

  1. WHERE clause
  2. ORDER BY clause
  3. GROUP BY clause
  4. Necessary Columns (not in WHERE, ORDER BY, or GROUP BY)

Let's take your first query:

Select  
  Memory.q, count(*) as count,  
  AVG(Memory.memory) as average_memory,  
  MAX(Memory.memory) as peak_memory, 
  AVG(Memory.execution_time) as average_execution_time, 
  MAX(Memory.execution_time) as peak_execution_time  
FROM Memory  
WHERE site_id = $some_site_id  
ORDER BY average_memory DESC  
GROUP BY Memory.q 
LIMIT 25 

Look at the four criteria:

  • WHERE has a single value, [site_id]
  • ORDER BY will order within WHERE, [average_memory]
  • GROUP BY will order within ORDER BY, [q]
  • Necessary Columns : [memory],[execution_time]

Everything in Brackets is what you put in the index in the order shown. Here is the index:

ALTER TABLE Memory ADD INDEX siteid_q_mem_exectime_index
(site_id,q,memory,execution_time);

Notice that average_memory is not a table column. It is derived from memory field.

Now, do the same thing with the second query:

Select       
  Memory.q, count(*) as count,       
  AVG(Memory.memory) as average_memory,       
  MAX(Memory.memory) as peak_memory,      
  AVG(Memory.execution_time) as average_execution_time,      
  MAX(Memory.execution_time) as peak_execution_time       
FROM Memory       
WHERE site_id = $some_site_id       
ORDER BY average_execution_time DESC       
GROUP BY Memory.q      
LIMIT 25      

Look at the four criteria:

  • WHERE has a single value, [site_id]
  • ORDER BY will order within WHERE, [average_execution]
  • GROUP BY will order within ORDER BY, [q]
  • Necessary Columns : [memory],[execution_time]

The result would be the same set of columns as before. Therefore, you do not need another index.

Here it is again:

ALTER TABLE Memory ADD INDEX siteid_q_mem_exectime_index
(site_id,q,memory,execution_time);

Why is this index so vital?

  • ORDER BY and GROUP BY ususally trigger internal sorting operations for temp tables. If the table is properly indexed, the data is already sorted as needed when traversing the index.
  • The necessary columns (memory,**execution_time**) are in the index for a great reason. If the index has every column needed for a result set, MySQL will not touch the table. It will read the needed data from the index only. This creates reduced Disk I/O.

Indexes created in this manner are known as "Covering Indexes".

Here are some nice links about this subject. Enjoy !!!

RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
-1

First of all, what I see, you must avoid GROUP BY - it takes a lot of memory. Just brake it into two queries. Also, create indexes as Marc B advised.

Timur
  • 6,668
  • 1
  • 28
  • 37