4

I have the following choice:

I have a huge table (9999999999999 rows), let's call it tableHuge, and I would like to split it into multiple tables (to optimize queries). This table contains dates (days of the month), and most queries are made using a specified month as search key in select. This leads me to the following choices:

Choice one: Split the table into multiple tables, using a month as his tail (like lessHugeTable_01, lessHugeTable_02, etc.). Then I can take care in my app to access table that I need. The main downside is loosing the ability to join, in cases that includes more than one month (or join with an union... well.. complications).

Choice two: Use table partitioning.

Since I never used partitioning before (so I don't have knowledge to compare), I would like some advice on how to do it, pros and cons if possible (except obvious things like "if your manual partition table gets broken you loose only that data while in table part you loose whole data").

Thank you for your time.

zozo
  • 8,230
  • 19
  • 79
  • 134
  • 1E13 rows? _what_ is that? Why do not archive old rows? – Alma Do Feb 21 '14 at 10:12
  • I typed a lot of nines... was trying to say it has a looooot of rows. – zozo Feb 21 '14 at 10:13
  • Answer depends of - can your date condition be a solid partition definer or not. And that is mostly about - will new records be written to old dates or not – Alma Do Feb 21 '14 at 10:15
  • It can be a solid partition. I oversimplified, the table tail will be _YYYYMM, making it solid. But the nr. of tables created, the inability to use joins... kind of hinder me. – zozo Feb 21 '14 at 10:18
  • What about new records that may(or may not) be written to old tables/partitions because of old date in them? Also which queries are supposed to run most often? What will be preferred - adding data or gathering some statistic reports (or both) ? – Alma Do Feb 21 '14 at 10:21
  • May write in both, used both for adding data, and gathering data for reports. Or actually... if it makes a huge difference can you explain why please? – zozo Feb 21 '14 at 10:24
  • It can't be explained in two words. And without clarifications from your side your question is too broad – Alma Do Feb 21 '14 at 10:26
  • Some references? Can't find anything edifying. I'm available for any question. Basically I'm interested in the most complex case (with insert/update/delete/select from any of the tables), and in the simplest case (I need it in multiple places), with only read from last table. – zozo Feb 21 '14 at 10:31

1 Answers1

4

The answer here is really "depends".

More specifically it depends on the nature of your data, what accesses your data and how that data is accessed.

From the sounds of it you might be best off with a table partitioned by year and month. I am making wild assumptions here that you will need to access older data less frequently/never and hence will be able to archive it off to keep data volumes down in your main table (like I said "depends"!);

If your table is, and always will be, accessed by one application alone into which you can build logic to handle your 'tail' naming conventions then you might want to go down the multiple tables route.

Here is how I see pros and cons stacking up:

Multiple Tables Pros

  1. Smaller individual table if just selecting data for a single month
  2. Errr. I can only think of one actually

Multiple Tables Cons

  1. Difficulty in querying/updating multi-month datasets
  2. What happens if you get data from January in the February table? "But it will never happen!". Really? Really?!
  3. If multiple applications need to access these tables then they will all have to have your 'tail' naming convention logic in place i.e. lessHugeTable_02 has data from February in it.

and now partitioning:

Partitioning Tables Pros

  1. You're letting MySQL take care of your data sharding for you. So no "this month = this table" logic needed in your application
  2. No risk of January data getting into the February table
  3. Joining becomes easier since you have a single logical (if not physical) table
  4. If you are using MySQL 5.5 or newer then you can truncate partitions. Really handy for any housekeeping you might want to do

Partitioning Tables Cons

  1. Potentially you have a much much larger dataset to query. If you run a query that spans multiple partitions then it will probably take a while. Choose your partition key wisely!
  2. Probably more but I am running out of time and wild assumptions!

PS There is a good answer on some of the points here

Community
  • 1
  • 1
Tom Mac
  • 9,693
  • 3
  • 25
  • 35
  • 1
    Really good analysis +1, sorry for lack of details but would take more than a few hours to show all the cases where I need something like that, and most of them are pretty specific. Also good reference. I'll leave the question open for 1 more day, then if nothing else appears I'll accept (hope you don't mind I don't accept on the spot). – zozo Feb 21 '14 at 13:30