0

We are operating an ad network and have some performance issues recently.

The table making problems is the table "clicks" with columns: click_id int, affiliate_id int, campaign_id int, country_id int, time datetime.

All our Reads and Inserts have country_id in WHERE clause.

In Fact a clicks from one country always blocks the complete table for every other country too. This is not necessary and slows down our performance a lot.

Dear MySQL professionals, am I right to understand that partitioning the clicks table on country_id is a great solution for us?

I read about it in mysql manual

But I'm not sure yet which Partitioning Type would be suitable for our needs.

Which one would you recommend and why?

Marki555
  • 6,434
  • 3
  • 37
  • 59
Benjamin
  • 31
  • 3
  • depends on what queries you do with that table. But most probably partition by date - you can easily remove / archive old items. – Marki555 Dec 03 '15 at 12:16
  • What do you think about the idea to make a partition for each country? – Benjamin Dec 03 '15 at 12:35
  • is the country distribution even? If half the data is e.g. USA you won't gain much from partitioning by country. – Marki555 Dec 03 '15 at 12:37
  • No, it's not. We are currently in the USA and in UK, having 70 % and 30 %. Clicks on those countries are not related to each other. In fact we could even split it to two tables "clicks_usa" and "clicks_uk". So I thought partititiong could help us, did I get it wrong? – Benjamin Dec 03 '15 at 12:44
  • yes, but only for few months... what will you do when USA data will be as large as all data now? You would need to partition by something else anyway... The question is: what are you trying to solve by partitioning? Slow INSERTs? SELECTs? DELETEs? INSERT into InnoDB table should be equally fast regardless of table size. – Marki555 Dec 03 '15 at 14:41
  • my main problem is not that the overall data is too huge, but the data of the last 24 hours until 7 days. Because on those durations we are making different reads. As long our business doesn't grow unexpected much, it will stay the same data in those durations. The problem is that queries now need to search 250.000 entries in last 7 days instead of 170.000 before we started in UK. As it worked perfectly before, I hoped to get the same by partitioning the countries. Bad idea? – Benjamin Dec 03 '15 at 15:58
  • If I got you right it might be better to make a partition for last 7 days and one for older data. Right? In this case we would ne a shell script that runs every night and places data older than 7 days to the other partition. What do you think about that approach? – Benjamin Dec 03 '15 at 16:56
  • You can have partition for each week (with partition names for example `201501` to `201548`). No need to move data between the partitions. Then MySQL would use only the partitions needed for each query. Also consider pre-computing aggregate tables for those 7 days. – Marki555 Dec 04 '15 at 12:22

1 Answers1

0

Do not use PARTITION simply because you think it might speed things up. It won't.

There are only 4 cases (that I know of) where PARTITIONing will speed up a system. My blog covers, in detail, the dropping "old" data as one of the useful cases, plus mentions the other 3.

I hope your country_id is a 2-byte "country code", not a 4-byte INT.

That the heck is a click_id?

In general high-traffic counters (eg, click counters) should be in a separate table with perhaps nothing more than an id and a counter. This takes a load off the much-bulkier main table and avoids some categories of locks.

I can see no use for partitioning on country_id.

"7 days plus 'older'" -- There is a problem with this. Tomorrow you will want to move a day's worth of data into the big "older" partition. That involves REORGANIZE PARTITION, which will copy all of "older" over. Too slow, and blocks the entire table. I've done such, but I had to make sure there was a "maintenance window" during which I could do it.

All our Reads and Inserts have country_id in WHERE clause.

Let's see some typical SELECTs. Do they always have a GROUP BY day? If so, then summary tables for each day may be warranted. This will greatly offload the SELECT activity on the Fact table and move it to the smaller, better indexed summary table(s). another blog.

Rick James
  • 135,179
  • 13
  • 127
  • 222