2

We are running MySQL/ISAM database with a following table:

create table measurements (
  `tm_stamp` int(11) NOT NULL DEFAULT '0',
  `fk_channel` int(11) NOT NULL DEFAULT '0',
  `value` int(11) DEFAULT NULL,
  PRIMARY KEY (`tm_stamp`,`fk_channel`)
);

The tm_stamp-fk_channel combination is required unique, hence the compound primary key. Now, for certain irrelevant reason, the database will be migrated to InnoDB engine. Upon googling something about it, i found out that the key will dictate the physical ordering of the data on the disk. 90% of the queries currently go as follows:

SELECT value FROM measurements
WHERE fk_channel=A AND tm_stamp>=B and tm_stamp<=C
ORDER BY tm_stamp ASC

Inserts are 99% in order of tm_stamp, it's a storage for dataloggers network. The table has low millions of rows but growing steadily. The questions are

  1. Should the sole change of storage engine result in any significant performance change, better or worse?
  2. Does the order of columns in the index matter with regards to the most popular SELECT? This blog suggest something along that line.
  3. Thanks to the nature of clustered index, may we perhaps leave out the ORDER BY clause and gain some performance?
Pavel Zdenek
  • 7,146
  • 1
  • 23
  • 38
  • you may find the following answer of interest http://stackoverflow.com/questions/4419499/mysql-nosql-help-me-to-choose-the-right-one-on-a/4421601#4421601 – Jon Black Oct 25 '12 at 10:03

3 Answers3

1

Staring at the Query

SELECT value FROM measurements
WHERE fk_channel=A AND tm_stamp>=B and tm_stamp<=C
ORDER BY tm_stamp ASC

Your static value is fk_channel and the moving ordered values is tm_stamp. This addresses your second question which seems to be at the heart of the Query's needs.

You would be way better off with PRIMARY KEY columns reversed

create table measurements (
  `tm_stamp` int(11) NOT NULL DEFAULT '0',
  `fk_channel` int(11) NOT NULL DEFAULT '0',
  `value` int(11) DEFAULT NULL,
  PRIMARY KEY (`fk_channel`,`tm_stamp`)
);

As for the first question, the storage engine dictates what gets cached.

If you remain with MyISAM, you could change the primary key to include the value column:

create table measurements (
  `tm_stamp` int(11) NOT NULL DEFAULT '0',
  `fk_channel` int(11) NOT NULL DEFAULT '0',
  `value` int(11) DEFAULT NULL,
  PRIMARY KEY (`fk_channel`,`tm_stamp`,`value`)
) ENGINE=MyISAM;

That way, your Query's data retrieval is strictly from one file at most, the .MYI of the MyISAM table. The table need not be read at all.

If your switch to InnoDB, fk_channel,tm_stamp gets loaded twice into RAM

  • Once from InnoDB data page
  • Once from InnoDB index page
Community
  • 1
  • 1
RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
1

Edit 1:

It appears that changing the primary key from

PRIMARY KEY (`tm_stamp`,`fk_channel`)

to

PRIMARY KEY (`fk_channel`,`tm_stamp`)

always makes sense, for both MyISAM and InnoDB. See http://sqlfiddle.com/#!2/0aa08/1 for proof this is so.

Original answer:

To determine if changing

PRIMARY KEY (`tm_stamp`,`fk_channel`)

to

PRIMARY KEY (`fk_channel`,`tm_stamp`)

would improve your query's performance, you need to determine which field's values cardinality is higher (which field's values are more varied). Running

SELECT COUNT(DISTINCT tm_stamp), COUNT(DISTINCT fk_channel) FROM measurements;

will give you the cardinality of the columns.

So, to answer your question properly we first need to know: What are the common range of values between B and C? 60? 3,600? 86,400? more?

For example, let's say that

SELECT COUNT(DISTINCT tm_stamp), COUNT(DISTINCT fk_channel) FROM measurements;

returns 32,768 and 256. 32,768 divided by 256 is 128. This tells us that tm_stamp has 128 unique values for every value of fk_channel.

So if the difference between B and C is usually less than 128, then leave tm_stamp as the first field in the primary key. If 128 or greater, then make fk_channel the first field.

Another question: Does fk_channel need to be an INT (4 billion unique values, half of which are negative)? If not, then changing fk_channel to TINYINT UNSIGNED (if you have 256 unique values), or SMALLINT UNSIGNED (65536 unique values) would save a lot of time and space.

For example, let's say you have 256 maximum possible fk_channel values, and 65,536 possible values, then you could change your schema via:

create table measurements_new (
  tm_stamp INT UNSIGNED NOT NULL DEFAULT '0',
  fk_channel TINYINT UNSIGNED NOT NULL DEFAULT '0', -- remove UNSIGNED if values can be negative
  value SMALLINT UNSIGNED DEFAULT NULL, -- remove UNSIGNED if values can be negative
  PRIMARY KEY (tm_stamp,fk_channel)
) ENGINE=InnoDB
SELECT
  tm_stamp,
  fk_channel,
  value
FROM
  measurements
ORDER BY
  tm_stamp,
  fk_channel;

RENAME TABLE measurements TO measurements_old, measurements_new TO measurements;

This will store the existing data in the new table in PRIMARY KEY order, which will improve performance somewhat.

Ross Smith II
  • 11,799
  • 1
  • 38
  • 43
  • Accepting answer for the visible effort put in the original answer as well as the revised sqlfiddle proof. While the difference between `B` and `C` is roughly on the order of 1-10.000's (*that is, for one channel*), i understand the `DISTINCT` query be referring to the whole table. And there we got **tens of millions** `tm_stamp`s per `fk_channel`. So the usefulness of swapping primary key order is even more sound. And fk_channel actually is a TINYINT, i only wanted to make the description simple. – Pavel Zdenek Nov 02 '12 at 15:41
0

The order of your arguments in the WHERE clause is irrellavent here, the optimizer will pick the best key option (usually a direct comparison on a indexed field over a > or < comparison). With your initial example, the best option was the tm_stamp <> comparison which was not a direct equality check and therefore sub-par.

However, the order of the clustered key does matters.... If the exact comparison is always on the fk_channel column, I'd change the PK to be:

   PRIMARY KEY (`fk_channel`,`tm_stamp`)

Now you've got an index that will benefit from the fk_channel=A in your where clause.

Also, while the storage engine plays a role somewhat, but I don't think the issue here is between innodb & myisam.

Finally, I don't think the ORDER BY clause has much bearing on your issue, that's done post query. A group by could affect your performance....

Ray
  • 40,256
  • 21
  • 101
  • 138