1

Our company's web application stores a ton of data points on thousands of visitors a day, and we are anticipating the hard disks will fill up soon. Our server can not support more hard drives, and we are not interested in little tricks to free up some space to buy us a few hours worth of space.

How can we solve this issue? The database is huge, over 200GB, and our website needs to be available, so I don't believe copying it and moving it to a new, larger server is a good option for us. Furthermore, what happens when THAT server runs out of disk space?

What do large scale web sites normally do to remedy this issue?

Thanks!

Paul B
  • 349
  • 4
  • 14
  • 3
    They usually archive/ purge the data on some sort of retention plan, or build a system that can scale as per their business requirements. If you need to keep that data, and expect it to grow at a fairly uncontrolled rate, you did not properly size your system. – Jeremy Holovacs Jun 17 '12 at 04:13
  • 1
    We didn't. I'm asking what we can do now. – Paul B Jun 17 '12 at 04:23
  • 1
    Well it depends on your business requirements. You can either 1) build a new system with scalable storage (NAS or SAN) and migrate or 2) implement a retention policy. Or you can do nothing and let the system crash. I don't see many other options for you. There's no magic trick that will make a problem of your own making go away. – Jeremy Holovacs Jun 17 '12 at 04:27
  • @Paul: Does the whole data need to be accessible from within your application? I mean, could some of it be archived, and kept for later use, e.g: usage analysis performed by an external system? – RandomSeed Jun 17 '12 at 04:42
  • 2
    This is off-topic for SO; belongs on [sf] – Jim Garrison Jun 18 '12 at 01:45
  • All the data needs to be accessible, as we need to display reports for various date ranges to our users. Also, sorry about that Jim, I'll keep that in mind next time. – Paul B Jun 18 '12 at 19:49

1 Answers1

0

You may want to investigate separating into multiple database servers as "shards. You will likely have to add some logic to your application to know where to find a set of data and how to join queries with data that originates from multiple shards. There are third-party applications that can assist you with this process.

Turnkey
  • 9,266
  • 3
  • 27
  • 36
  • I don't think that covers making the data any smaller. – Dre Jun 17 '12 at 04:33
  • 3
    @Dre: IMHO the question was not "how to reduce the size of the data" but rather "how to cope with this increasing volume of data". It sounds like keeping all this data is a business requirement. Therefore, this is a valid suggestion. – RandomSeed Jun 17 '12 at 04:35
  • Yup, we don't want to decrease the size of the data. When you say "shards" - is that something built into MySQL, or do you literally mean use multiple DB servers and code a way to see which is to be used? Also, is MySQL Partitioning an option & feasible to setup without experience doing so? – Paul B Jun 17 '12 at 04:41
  • It depends on your needs. MongoDB, for instance, is built on a sharding model, so scaling to multiple machines is fairly (logically) trivial. MySQL and other RDBSs are generally not built to scale as nicely, so you may have to manually "shard" your data. The idea though, of splitting your data into large, easily separable data sets, is storage-system agnostic. – dimo414 Jun 17 '12 at 04:54
  • Not built into MySQL but it is often done. You will need to review how your data is structured. For example you could shard based on a range of primary keys or the range of create date of the records. I put in the answer a link to DBShards which is a tool for faciliting this with MySQL. They have a link discussing strategies: http://www.dbshards.com/dbshards/database-sharding-strategies/. – Turnkey Jun 17 '12 at 12:35
  • Alright, thanks. MySQL clustering would help with this too, right? – Paul B Jun 18 '12 at 03:18
  • Yes, and you gain some high availability as well. You will need an extra server than the scenario of the plain type of sharding but one advantage is that it shouldn't require application changes. – Turnkey Jun 18 '12 at 04:36