11

I'm currently trying to improve the speed of SELECTS for a MySQL table and would appreciate any suggestions on ways to improve it.

We have over 300 million records in the table and the table has the structure tag, date, value. The primary key is a combined key of tag and date. The table contains information for about 600 unique tags most containing an average of about 400,000 rows but can range from 2000 to over 11 million rows.

The queries run against the table are:

  SELECT date,
         value 
    FROM table 
   WHERE tag = "a" 
     AND date BETWEEN 'x' and 'y' 
ORDER BY date

....and there are very few if any INSERTS.

I have tried partitioning the data by tag into various number of partitions but this seems to have little increase in speed.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
allyLogan
  • 403
  • 2
  • 6
  • 13
  • When you leave out the `ORDER BY`, does that help? Can you post actual timing of queries with and without the ORDER BY? – S.Lott Jan 23 '11 at 18:24
  • 5
    post an explain of the query by doing EXPLAIN SELECT date,value FROM table where tag = "a" and date BETWEEN 'x' and 'y' order by date – piyush Jan 23 '11 at 18:24
  • 3
    You haven't mentioned indexes - try either one per column (tag, date, value), or a single composite index containing all three. Be aware that column order matters in a composite index -- starting from the left, if the column isn't referenced in the query, the index won't be used. – OMG Ponies Jan 23 '11 at 18:28
  • @Ponies: the Primary Key declaration will have created an index on (tag, date), presumably? – Larry Lustig Jan 23 '11 at 19:56
  • Explain Query result: id:'1', select_type:'SIMPLE', table:'table', type:'range', possible_keys:'PRIMARY', key:'PRIMARY', key_len:'85', ref:NULL, rows:'29559', Extra:'Using where' – allyLogan Jan 24 '11 at 08:46
  • I don't see any detail about the speed of this query. Is it executed often? What is the average run-time? What is the target run-time? The best way to optimize a query is not to execute it... for example, by caching the result... – pascal Jan 24 '11 at 12:41

8 Answers8

5

take time to read my answer here: (has similar volumes to yours)

500 millions rows, 15 million row range scan in 0.02 seconds.

MySQL and NoSQL: Help me to choose the right one

then amend your table engine to innodb as follows:

create table tag_date_value
(
tag_id smallint unsigned not null, -- i prefer ints to chars
tag_date datetime not null, -- can we make this date vs datetime ?
value int unsigned not null default 0, -- or whatever datatype you require
primary key (tag_id, tag_date) -- clustered composite PK
)
engine=innodb;

you might consider the following as the primary key instead:

primary key (tag_id, tag_date, value) -- added value save some I/O

but only if value isnt some LARGE varchar type !

query as before:

select
 tag_date, 
 value
from
 tag_date_value
where
 tag_id = 1 and
 tag_date between 'x' and 'y'
order by
 tag_date;

hope this helps :)

EDIT

oh forgot to mention - dont use alter table to change engine type from mysiam to innodb but rather dump the data out into csv files and re-import into a newly created and empty innodb table.

note i'm ordering the data during the export process - clustered indexes are the KEY !

Export

select * into outfile 'tag_dat_value_001.dat' 
fields terminated by '|' optionally enclosed by '"'
lines terminated by '\r\n'
from
 tag_date_value
where
 tag_id between 1 and 50
order by
 tag_id, tag_date;

select * into outfile 'tag_dat_value_002.dat' 
fields terminated by '|' optionally enclosed by '"'
lines terminated by '\r\n'
from
 tag_date_value
where
 tag_id between 51 and 100
order by
 tag_id, tag_date;

-- etc...

Import

import back into the table in correct order !

start transaction;

load data infile 'tag_dat_value_001.dat' 
into table tag_date_value
fields terminated by '|' optionally enclosed by '"'
lines terminated by '\r\n'
(
tag_id,
tag_date,
value
);

commit;

-- etc...
Community
  • 1
  • 1
Jon Black
  • 16,223
  • 5
  • 43
  • 42
1

What is the cardinality of the date field (that is, how many different values appear in that field)? If the date BETWEEN 'x' AND 'y' is more limiting than the tag = 'a' part of the WHERE clause, try making your primary key (date, tag) instead of (tag, date), allowing date to be used as an indexed value.

Also, be careful how you specify 'x' and 'y' in your WHERE clause. There are some circumstances in which MySQL will cast each date field to match the non-date implied type of the values you compare to.

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
  • There are lot more different values for the date column than the tag column as the date column can have a value every couple of seconds and it is not predictable and the tag column has only 600 different values – allyLogan Jan 24 '11 at 09:25
  • Depending on how wide your x to y date range is, you might get considerably better performance with by trying (date, tag) as the primary key. Give it a try. – Larry Lustig Jan 24 '11 at 13:55
  • whilst we call the date 'date' it is actually a timestamp with values every second or so and some tags have 11 million values so the date range is pretty wide. I tried to create a new table with (date,tag) as primary key but after about 12 hours of data upload it was still only a 6th of the way through!! – allyLogan Jan 25 '11 at 10:15
  • The increased cardinality of the a timestamp actual makes it a _good_ candidate for being first in the primary key, but only if the searches you perform return a relatively small range of timestamp values. So it sounds like my suggestion will not, in the end, help you out. – Larry Lustig Jan 25 '11 at 12:51
1

I would do two things - first throw some indexes on there around tag and date as suggested above:

alter table table add index (tag, date);

Next break your query into a main query and sub-select in which you are narrowing your results down when you get into your main query:

SELECT date, value
FROM table
WHERE date BETWEEN 'x' and 'y'
AND tag IN ( SELECT tag FROM table WHERE tag = 'a' )
ORDER BY date
OregonJeff
  • 13
  • 2
1

Your query is asking for a few things - and with that high # of rows, the look of the data can change what the best approach is.

   SELECT date, value 
   FROM table 
   WHERE tag = "a" 
     AND date BETWEEN 'x' and 'y' 
   ORDER BY date

There are a few things that can slow down this select query.

  1. A very large result set that has to be sorted (order by).
  2. A very large result set. If tag and date are in the index (and let's assume that's as good as it gets) every result row will have to leave the index to lookup the value field. Think of this like needing the first sentence of each chapter of a book. If you only needed to know the chapter names, easy - you can get it from the table of contents, but since you need the first sentence you have to go to the actual chapter. In certain cases, the optimizer may choose just to flip through the entire book (table scan in query plan lingo) to get those first sentences.
  3. Filtering by the wrong where clause first. If the index is in the order tag, date... then tag should (for a majority of your queries) be the more stringent of the two columns. So basically, unless you have more tags than dates (or maybe than dates in a typical date range), then dates should be the first of the two columns in your index.

A couple of recommendations:

  1. Consider if it's possible to truncate some of that data if it's too old to care about most of the time.
  2. Try playing with your current index - i.e. change the order of the items in it.
  3. Do away with your current index and replace it with a covering index (has all 3 fields in it)
  4. Run some EXPLAIN's and make sure it's using your index at all.
  5. Switch to some other data store (mongo db?) or otherwise ensure this monster table is kept as much in memory as possible.
Jody
  • 8,021
  • 4
  • 26
  • 29
0

I'd say your only chance to further improve it is a covering index with all three columns (tag, data, value). That avoids the table access.

I don't think that partitioning can help with that.

Markus Winand
  • 8,371
  • 1
  • 35
  • 44
0

I would guess that adding an index on (tag, date) would help:

alter table table add index (tag, date);

Please post the result of an explain on this query (EXPLAIN SELECT date, value FROM ......)

Arnaud Le Blanc
  • 98,321
  • 23
  • 206
  • 194
0

I think that the value column is at the bottom of your performance issues. It is not part of the index so we will have table access. Further I think that the ORDER BY is unlikely to impact the performance so severely since it is part of your index and should be ordered.

I will argument my suspicions for the value column by the fact that the partitioning does not really reduce the execution time of the query. May you execute the query without value and further give us some results as well as the EXPLAIN? Do you really need it for each row and what kind of column is it?

Cheers!

Lachezar Balev
  • 11,498
  • 9
  • 49
  • 72
0

Try inserting just the needed dates into a temporary table and the finishing with a select on the temporary table for the tags and ordering.

CREATE temporary table foo
SELECT date, value 
FROM table 
WHERE date BETWEEN 'x' and 'y' ;

ALTER TABLE foo ADD INDEX index( tag );

SELECT date, value 
FROM foo 
WHERE tag = "a" 
ORDER BY date;

if that doesn't work try creating foo off the tag selection instead.

CREATE temporary table foo
SELECT date, value 
FROM table 
WHERE tag = "a";    

ALTER TABLE foo ADD INDEX index( date );

SELECT date, value 
FROM foo 
WHERE date BETWEEN 'x' and 'y' 
ORDER BY date;
riotopsys
  • 564
  • 4
  • 7