0


We have a single huge table around 10 TBs in which each row is approximately 10 KB. The primary key's type is BIGINT and it is uniformly distributed. The server has 40 GBs of RAM, two 4-core Xeon CPUs, and five 2 TB disks. The work load is merely single inserts/updates or selects by primary key. No transactions or joins are needed.

Considering the table and the available RAM, it seems that caching will not be very effective since the queries are mostly random. Now the question is which config is better for more query per second (especially selects):

  1. Using RAID-0 or LVM with striping to increase read/write performance.
  2. Use table partitioning (using primary key) and store each table in a different hard disk.

We want to use MySQL but and other open source DBMS like PostgreSQL is also acceptable. Logically, for our setting partitioning must work better because the seeks are distributed between different disks. However, the most-used pattern is using RAID-0.

Does anyone has any idea or experience related to this problem.

Community
  • 1
  • 1
Saeed Shahrivari
  • 815
  • 1
  • 9
  • 16
  • Related question: http://stackoverflow.com/questions/45879/mysql-partitioning-sharding-splitting-which-way-to-go?rq=1 – AmazingDreams Dec 06 '13 at 15:41
  • Yes, I have seen that but it is not helpful! – Saeed Shahrivari Dec 06 '13 at 15:45
  • What kind of data are you storing? Maybe what you need is a distributed key-value store, not a singular gigantic table. Postgres and MySQL offer different advantages, so unless you can qualify what you're doing, they're both valid options. – tadman Dec 06 '13 at 16:09
  • @tadman The most of the data is compressed text. I know we can use solutions like Voldemort, HBase, and Cassandra. But, we have just a single server. I should admit that we are using the RDBMS as a key/value store. – Saeed Shahrivari Dec 06 '13 at 16:13
  • A single server is not the answer here. At the very least set up a hypervisor and have one virtual instance per physical drive on your system, that'll give you a lot more durability. A key-value store is pretty easy to shard. – tadman Dec 06 '13 at 16:35

1 Answers1

2

Whether partitioning the data across separate physical disks, or using RAID 0, the idea is that you're splitting up the disk access between multiple drive heads/platters that can work in parallel to speed access.

Theoretically, they would be fairly equivalent, however, partitioning should come out ahead when we consider the benefit you get with indexes.

You stated in your question that you are inserting records (not merely appending), thus, quite often, your indexes will need to be rebuilt. It's less overhead to rebuild a smaller partitioned index than it is to rebuild one huge index, not to mention the concurrency benefits of the other partitions not having to wait while the other is rebuilt.

In addition, searching partitioned (smaller) indexes is faster. Both of these index benefits are the main reason that table partitioning exists, so it seems like a good case for it here.

I've never partitioned a table across multiple physical disks. Assuming it's possible, it sounds like a great idea.

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • RAID is not going to help here. When a single drive in your array goes, the whole thing is going down in flames. From a reliability standpoint you're better to have N servers of size 1 rather than one server of size N. The more drives in a RAID-0 array, the higher the chance of catastrophic failure. – tadman Dec 06 '13 at 16:33
  • @SaeedShahrivari, RAID 0 seeks would be distributed as long as your stripe size was > 10k (e.g. 128k). You'd still have boundary issues though. RAID 0 deals with sectors, not files, and each head/platter serves sectors up as fast as they can. Still, partitioning is the better solution for the reasons I state in my answer. – Marcus Adams Dec 06 '13 at 17:32
  • @MarcusAdams , U R right RAID-0 can help improving IOPS but distributing seeks is not guaranteed in RAID-0 and as U said, I think partitioning is a better solution. – Saeed Shahrivari Dec 06 '13 at 20:43