21

I have a huge table that stores many tracked events, such as a user click.

The table is already in the 10s of millions, and it's growing larger every day. The queries are starting to get slower when I try to fetch events from a large timeframe, and after reading quite a bit on the subject I understand that partitioning the table may boost the performance.

What I want to do is partition the table on a per month basis.

I have only found guides that show how to partition manually each month, is there a way to just tell MySQL to partition by month and it will do that automatically?

If not, what is the command to do it manually considering my partitioned by column is a datetime?

BinaryButterfly
  • 18,137
  • 13
  • 50
  • 91
shaharmor
  • 1,636
  • 3
  • 14
  • 26

4 Answers4

23

As explained by the manual: http://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html

This is easily possible by hash partitioning of the month output.

CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE)
    ENGINE=INNODB
    PARTITION BY HASH( MONTH(tr_date) )
    PARTITIONS 6;

Do note that this only partitions by month and not by year, also there are only 6 partitions (so 6 months) in this example.

And for partitioning an existing table (manual: https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html):

ALTER TABLE ti
    PARTITION BY HASH( MONTH(tr_date) )
    PARTITIONS 6;

Querying can be done both from the entire table:

SELECT * from ti;

Or from specific partitions:

SELECT * from ti PARTITION (HASH(MONTH(some_date)));
BinaryButterfly
  • 18,137
  • 13
  • 50
  • 91
Wolph
  • 78,177
  • 11
  • 137
  • 148
  • 2
    You mean that once i partition a whole year, the next year will go to the same partition? – shaharmor Oct 14 '13 at 08:08
  • 6
    Indeed, with this particular hashing method. Alternatively you could do `YEAR(tr_date) * 12 + MONTH(tr_date)` of course – Wolph Oct 14 '13 at 08:23
  • so if i want to HASH based on both YEAR and MONTH i can do something like: `HASH(CONCAT(YEAR(datetimefield), MONTH(datetimefield))` or should it be something else? – shaharmor Oct 14 '13 at 08:44
  • 25
    I preferred using `PARTITION BY HASH((YEAR(TIMESTAMP) * 100) + MONTH(TIMESTAMP))`, it gives nicely formatted dates (201511, 201512, 201601, etc) - same as the concat function though – keithl8041 Jan 07 '16 at 20:03
  • 2
    Can you please explain why you would only have 6 partitions and not 12 - does this mean that each partition would store data for 2 months? – Scary Wombat Jun 15 '16 at 02:43
  • @ScaryWombat it was a completely arbitrary choice. You could easily increase it and/or include years as well – Wolph Jun 15 '16 at 07:47
  • Can you please tell me ALTER TABLE query for above rather than CREATE TABLE. – Ronak Patel Jun 28 '16 at 06:20
  • @RonakP: please take a look at the modified answer :) – Wolph Jun 28 '16 at 14:54
  • Thank you so much @Wolph, can you please share select query for this ? – Ronak Patel Jun 29 '16 at 07:32
  • @RonakP The select query is just as you would normally use it. I've added a few examples – Wolph Jun 29 '16 at 12:03
  • 1
    @ScaryWombat Yes, each partition would have two months. `When PARTITION BY HASH is used, MySQL determines which partition of num partitions to use based on the modulus of the result of the user function. In other words, for an expression expr, the partition in which the record is stored is partition number N, where N = MOD(expr, num)` So partitions would include months 1&7, 2&8, etc., years ignored. – Alden W. Jul 05 '16 at 16:22
  • 1
    This query is syntactically incorrect: SELECT * from ti PARTITION (HASH(MONTH(some_date))); Try running it. Please advise correct syntax for querying table which partitioned as follows: PARTITION BY HASH( MONTH(tr_date) ) – Striker Jan 29 '19 at 20:27
  • @Striker: not according to the manual: https://dev.mysql.com/doc/refman/8.0/en/partitioning-selection.html – Wolph Jan 30 '19 at 00:08
  • @Wolph The syntax is for explicit partition names which you have with range partitioning, but you have no explicit names in your example – Sebi2020 Jan 23 '22 at 10:35
4
CREATE TABLE `mytable` (
  `post_id` int DEFAULT NULL,
  `viewid` int DEFAULT NULL,
  `user_id` int DEFAULT NULL,
  `post_Date` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY RANGE (extract(year_month from `post_Date`))
(PARTITION P0 VALUES LESS THAN (202012) ENGINE = InnoDB,
 PARTITION P1 VALUES LESS THAN (202104) ENGINE = InnoDB,
 PARTITION P2 VALUES LESS THAN (202108) ENGINE = InnoDB,
 PARTITION P3 VALUES LESS THAN (202112) ENGINE = InnoDB,
 PARTITION P4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
  • 1
    add context to prevent down-voting. End of Review. – ZF007 Nov 24 '20 at 13:08
  • Is the Optimizer smart enough to handle `extract(year_month...)`? (There is a very limited number of expressions that can be effectively used in the `PARTITION` statement.) – Rick James Nov 27 '20 at 21:59
1

Be aware of the "lazy" effect doing it partitioning by hash:

As docs says:

You should also keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.

The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for “pruning” on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.

For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect to date_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col).

Joundill
  • 6,828
  • 12
  • 36
  • 50
-2

HASHing by month with 6 partitions means that two months a year will land in the same partition. What good is that?

Don't bother partitioning, index the table.

Assuming these are the only two queries you use:

SELECT * from ti;
SELECT * from ti PARTITION (HASH(MONTH(some_date)));

then start the PRIMARY KEY with the_date.

The first query simply reads the entire table; no change between partitioned and not.

The second query, assuming you want a single month, not all the months that map into the same partition, would need to be

SELECT * FROM ti  WHERE the_date >= '2019-03-01'
                    AND the_date  < '2019-03-01' + INTERVAL 1 MONTH;

If you have other queries, let's see them.

(I have not found any performance justification for ever using PARTITION BY HASH.)

Rick James
  • 135,179
  • 13
  • 127
  • 222