2

I have a table that has reached nearly 3 million records. While I appreciate this amount may not be considered large relative to RDBMS, I have noticed slow down in operations that deal with this table. I feel I can optimize or re-architect it.

This is part of the database for the PocketBracket March Madness App. Essentially the table stores meta data for a one to many relationship (Brackets has many Bracket Picks). The twist is the demand on the table is different at times. For example, there is a short period (two weeks) where the table is performing mostly writes. But for the rest of the year it is mostly reads. Furthermore, a far majority of the records are not accessed.

Here is a screenshot of the current structure:

enter image description here

With that, here are some thoughts I have:

  • Place the old records in a separate table. Decreases the amount of records, but would require code modifications.
  • Denormalize the table so the models are 1-1 (i.e. condense all bracket picks into a single serialized column). Decreases the amount of records, but would require code modifications.
  • Swap the table's engine and or indexes during demand periods (i.e. InnoDB/MyISAM).
  • Something I haven't thought of...

I would appreciate your direction. In the end, I am fine with code changes, I just want to ensure I'm re-architecting in the right direction.

Jason McCreary
  • 71,546
  • 23
  • 135
  • 174
  • Are you experiencing slow writing, reading or both? – Vyktor Feb 09 '12 at 17:58
  • I would say both. Is there a benchmark I can provide that will help demonstrate? – Jason McCreary Feb 09 '12 at 19:45
  • If you have benchmark like this one: `DO BENCHMARK(10000, (SELECT col FROM table INNER JOIN ... WHERE brackedID = RAND()))` (be careful to join foreign columns and select one of joined fields) it may help. Are you using Inno or myIsam? (I haven't noticed in your question, please add that) – Vyktor Feb 09 '12 at 19:51
  • MyISAM. I updated the question. Not familiar with `DO BENCHMARK`, I'll test it. – Jason McCreary Feb 09 '12 at 23:08
  • I **guess** that InnoDB may have better performance when you apply `FOREIGN KEYS`, will you be able to test that somewhere? (It has real "physical" connection between referenced rows so addressing may work faster). The best would be to `DO BENCHMARK` with both MyISAM and InnoDB (I'm curious myself). – Vyktor Feb 09 '12 at 23:14
  • Based on this http://blogs.oracle.com/MySQL/entry/comparing_innodb_to_myisam_performance is InnoDB much faster, I'll get the whitepaper, check it and than form it to real answer if you're okay with that :) – Vyktor Feb 09 '12 at 23:19
  • For the `BENCHMARK` on the current table (MyISAM), **100** iterations took **364.7781 seconds**. I'll benchmark it against InnoDB locally later. – Jason McCreary Feb 09 '12 at 23:27
  • 1
    perhaps a table partitioning might help you abit too. http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html – Sergey Benner Feb 09 '12 at 23:55
  • 1
    You should read the following answer and understand how innodb uses clustered indexes before converting. A poorly designed innodb table can perform as badly as any other. http://stackoverflow.com/questions/4419499/mysql-nosql-help-me-to-choose-the-right-one-on-a/4421601#4421601 – Jon Black Feb 11 '12 at 18:46
  • @Sergey Benner, excellent suggestion. This appears to have helped. Please make an answer so I can properly mark it. – Jason McCreary Feb 13 '12 at 18:51
  • @JasonMcCreary I've added the answer with partitioning. – Sergey Benner Feb 14 '12 at 10:19

3 Answers3

1

Based on this oracle blog article (and attached whitepaper) I assume that migrating from MyISAM to InnoDB may solve your problems. Just out of curiosity their benchmark hardware configuration:

  • 4 Sockets, 48 cores total, 4 x 12-core AMD Opteron 6172 “Magny-Cours” 2.1GHz CPUs. (Note: 36 cores were allocated to MySQL and the remaining 12 the Sysbench processes).
  • 64 GB DDR3 RAM
  • 2 x Intel X25E SSD drives

But more importantly read-write comparison:

As the graph below shows, InnoDB delivered 35x higher throughput than MyISAM, while achieving 85% - 90% scalability from 6 to 36-cores. Above 30-cores, the scalability curve starts to flatten out as the number of hot mutexes grow, but performance does still continue to increase.

And read-only comparison:

InnoDB delivered 4.6x higher throughput than MyISAM, while achieving 90% - 95% scalability from 6 to 36-cores. Above 30-cores, scalability flattens out as the server is again saturated by a number of hot mutexes.

All quotes are from Oracle's article of January 2011, with their copyright: Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

The only disadvantages they mention in InnoDB vs. MyISAM are:

  • No R-Trees
  • No fulltext indexes
  • Maximum table size 64TB (MyISAM 256TB).

Here's an article on tuning InnoDB.

You should probably BENCHMARK your queries on both MyISAM engine and InnoDB (make sure that you set up FOREIGN KEYs correctly). You may use benchmarks like this:

DO BENCHMARK( 100, (SELECT games.someField
    FROM brackets
    INNER JOIN relation_table ON relation_table.bracketID = brackets.id
    INNER JOIN games ON games.id = relation_table.gameID
    LIMIT 1  
));

If migrating to InnoDB won't help I'm afraid the best solution will be to migrate your codes and store old results somewhere else (increasing memory limits for database may affect it's performance greatly).

Anyway, please post results into the comment, I'm curious about this one

Vyktor
  • 20,559
  • 6
  • 64
  • 96
  • 1
    Unfortunately simply switching to InnoDB actually doubled my benchmark times. However, I have not finalized foreign keys. This is something I will continue to test when I have more time. For now though, partitioning seemed to provide immediate results and is less intensive. I really appreciate the thorough answer and wish I could upvote more than once. – Jason McCreary Feb 13 '12 at 18:53
  • 1
    @Silenteh add `@JasonMceary` to your comment .) – Vyktor Feb 14 '12 at 18:32
1

Before going crazy with "optimizations" like partitioning, sharding, denormalization etc... which will introduce just a lot of additional problems I would first try to detect what is causing the slow down.

To give you an example I have a table of around 30 millions record and I do quite a lot of inserts, and reads per second, and I can get query results of around 2000 records in less than 300 ms. (still it could be probably improved)

Additionally please keep in mind that partitioning makes sens when you can split the files between multiple disks and therefore you can increase the overall I/O. So in other words you would need to have a partition file per disk to achieve good performances.

So first of all it is important to have some more info

  • Amount of server ram
  • Server disk/s type
  • Is there a RAID ? if yes which one ?
  • Is it a virtual server
  • How many queries per second does the mysql handle ?

It could simply be that the table is just fragmented and the mysql config needs some more tuning. First of all you should switch to innodb, like Vyktor suggested, and then you have also to configure the innod db buffer to an higher value. The default one is too low. Here an example of my config file. Keep in mind that the parameters are tuned for my type of data and queries and server specs. Additionally I am using a variant of MySQL called percona, which could help you out as well, cause it is proved to be faster. On the site you can find some benchmarks.

innodb_file_per_table  
innodb_file_format=barracuda

innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size = 3GB
query_cache_size = 98304
innodb_log_file_size = 10485760
innodb_log_buffer_size = 3145728

I would also try to run a mysqlcheck. WARNING !!! it locks the table !

If you need more info about mysql tuning, this is a great blog

Silenteh
  • 516
  • 2
  • 6
  • Thanks for the additional info. And I do agree with you. As noted in my question, I don't feel 3 million records is *large relative to RDBMS*. Unfortunately I am on a shared environment at the moment. So modifying some of the configurations you suggested are likely not possible. – Jason McCreary Feb 14 '12 at 01:10
  • @JasonMceary In this case what I suggest is to start to think about a different hosting solution. In fact you might be able to solve this problem by denormalizing the data, but at some point you will hit again the same wall....the hardware. It is in fact likely that the shared environment in your case is the problem. Low I/O cause it is shared, default MySQL configs. – Silenteh Feb 14 '12 at 20:26
  • Understood and I don't disagree. For now though, I am willing to accept a temporary solution until a more permanent one presents itself. I think there are some excellent suggestions here. Just need to determine which one solves the problem with the least *resistance*. – Jason McCreary Feb 14 '12 at 23:16
0

Ok, Partitioning has following advantages. Here are some excerpts below taken from mysql documentation. I also provide a list of links on table partitioning for different databases at the end of the answer. Some of you might also want to read about SHARDING http://en.wikipedia.org/wiki/Shard_(database_architecture)

But as every technology the partitioning should be treated with care not just follow advises blindly it has it's disadvantages and probably one I find is that it requires lots interaction and its manageability suffers as Tom Kyte said in his oracle blog:

are your tables getting larger then you feel comfortable managing?  eg: it might take longer to 
restore a 100gig tablespace than 1-10 gig tablespace (and the other 90gig of data is online whilst 
doing this) 

Advantages:

  • Partitioning makes it possible to store more data in one table than can be held on a single disk or file system partition.

  • Data that loses its usefulness can often be easily removed from a partitioned table by dropping the partition (or partitions) containing only that data. Conversely, the process of adding new data can in some cases be greatly facilitated by adding one or more new partitions for storing specifically that data.

  • Some queries can be greatly optimized in virtue of the fact that data satisfying a given WHERE clause can be stored only on one or more partitions, which automatically excluding any remaining partitions from the search. Because partitions can be altered after a partitioned table has been created, you can reorganize your data to enhance frequent queries that may not have been often used when the partitioning scheme was first set up. This ability to exclude non-matching partitions (and thus any rows they contain) is often referred to as partition pruning, and was implemented in MySQL 5

Other benefits usually associated with partitioning include those in the following list. These features are not currently implemented in MySQL Partitioning, but are high on our list of priorities.

  • Queries involving aggregate functions such as SUM() and COUNT() can easily be parallelized. A simple example of such a query might be SELECT salesperson_id, COUNT(orders) as order_total FROM sales GROUP BY salesperson_id;. By “parallelized,” we mean that the query can be run simultaneously on each partition, and the final result obtained merely by summing the results obtained for all partitions.

  • Achieving greater query throughput in virtue of spreading data seeks over multiple disks.


the links

  • mysql :

http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

http://forums.mysql.com/list.php?106

http://www.slideshare.net/datacharmer/mysql-partitions-tutorial

  • mssql server:

http://msdn.microsoft.com/en-us/library/ms345146(v=sql.90).aspx

http://msdn.microsoft.com/en-us/library/ms190787.aspx

  • oracle:

http://docs.oracle.com/cd/B10501_01/server.920/a96524/c12parti.htm

http://docs.oracle.com/cd/B28359_01/server.111/b32024/partition.htm

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:728425384831

  • postgresql:

http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html


hope it helps abit

Sergey Benner
  • 4,421
  • 2
  • 22
  • 29