4

We are currently evaluating the use of MySQL partitions for a small application of ours. The application basically just sits at the end of a message queue and logs our API requests (that include a timestamp) to a database using Hibernate. Unfortunately we get a lot of requests and querying the database has become very slow.

What we would like to do is partition the table by timestamp (per month) since our regular query pattern is something like "get certain requests between time A and B". If A and B are in two consecutive months, which will mostly be true, then this would just hit two partitions.

As MySQL's range partitions have to be created manually I would like to add this maintenance task to our Java application where it can be done automatically. The idea is something like:

  1. Have a utility Thread that runs regularly (using ScheduledExecutorService or something)
  2. In the thread, check if there is a partition for the next month
  3. If not, create it

That's all fine, but I'm stuck trying to get MySQL's partitioning info using Hibernate and create partitions. What's the best way to do this (I'm OK if this will be MySQL specific)?

  • Is there a specific API in Hibernate to get MySQL partitioning info for a table, and also to create partitions?
  • Should I use raw SQL (SHOW CREATE TABLE ..., ALTER TABLE ... ADD PARTITION) and parse the output myself?

EDIT:

The table looks like this (I removed some columns not relevant to the question):

CREATE TABLE `request` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `apikey` varchar(32) NOT NULL,
  `timestamp` datetime NOT NULL,
  `rows` int(11) DEFAULT NULL,
  `user_id` varchar(15) DEFAULT NULL
  PRIMARY KEY (`id`),
  KEY `apikey_idx` (`apikey`),
  KEY `timestamp_idx` (`timestamp`),
  KEY `apikey_timestamp_rows_idx` (`apikey`,`timestamp`,`rows`)
) ENGINE=InnoDB AUTO_INCREMENT=2190385211 DEFAULT CHARSET=utf8

and the slow query is (generated by Doctrine apparently):

SELECT 
  r0_.user_id AS user_id0, COUNT(r0_.id) AS sclr1
FROM
  request r0_
WHERE
  r0_.apikey = 'XXX' AND r0_.rows > 0 AND r0_.timestamp >= '2015-09-15 00:00:00' AND r0_.timestamp < '2015-10-15 00:00:00'
GROUP BY r0_.user_id
HAVING sclr1 > 0
ORDER BY sclr1 DESC
LIMIT 500

When EXPLAINing the query MySQL says it's using the apikey_timestamp_rows_idx index.

A little context: We want to know, for a given API key, how many requests with rows > 0 each user sent in a given period.

The table currently has about 2.2bn rows.

Georg M. Sorst
  • 264
  • 4
  • 13

2 Answers2

0

I am not aware of any hibernate API that deals with table partitioning.

I think you have no choice but to use native SQL. You could either have the SQL in your Java code (as I think you are suggesting) or put it in a stored procedure.

You can schedule this using Java or MySQL. If you do it with a thread in your app server, you have the issue that every one of your app servers will have such a scheduled job. This makes it hard(er) to control how often the job actually executes. This may not be a big deal in this case, as the partition-related queries are not very heavy.

You can also schedule it in MySQL (see How to schedule a MySQL query?). This option may provide more visibility into the job (e.g., to your DBA) and be easier to manage and monitor.

Community
  • 1
  • 1
Rob
  • 6,247
  • 2
  • 25
  • 33
0

I don't see that partitioning can help. You must scan a lot of rows; that is what is slow.

KEY `apikey_idx` (`apikey`),
KEY `apikey_timestamp_rows_idx` (`apikey`,`timestamp`,`rows`)

The first is not needed because of the second. DROP the first. (This will speed up INSERTs.)

apikey smells like some kind of hash; is it? Is it hex? You could save a lot of disk space by UNHEXing it and storing it into BINARY(16) (in all tables that use apikey). (Smaller --> less I/O --> faster.)

Assuming that the rows do not change after they are inserted... I would build a "summary table" that stores

  • date (from timestamp)
  • rows>0 or not
  • apikey
  • COUNT(*)

From that summary table, an equivalent SELECT will run much faster.

Think about building (and incrementally maintaining) Summary Tables for other queries like that.

I suggest that Hibernate is getting in the way of thinking about the optimal way to store and retrieve the data.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Querying the summary tables will of course be very fast, but building them will take a lot of time, so what's the gain there? My idea with partitioning was this: The table is very big, but includes lots of data we don't care about (currently). So if all the data we care about is in one or two partitions (the last two months) then the relevant indices, table files etc. will become smaller and thus easier to cache etc. Is this not correct? – Georg M. Sorst Oct 19 '15 at 21:26
  • Once the Summary Table(s) are initialized, _incrementally_ augment them. For example, at midnight, summarize yesterday's data with `INSERT INTO Summary SELECT DATE(timestamp), apikey, rows>0, COUNT(*) FROM Fact WHERE timestamp >= CURRENT_DATE() - INTERVAL 1 DAY AND timestamp < CURRENT_DATE() GROUP BY 1,2;` – Rick James Oct 20 '15 at 00:09