8

I have a buddy who runs a web app for people listing cars for sale. There are a few thousand clients who use it, and each client has hundreds and sometimes thousands of rows in the database (some have been on for 5 years with hundreds of cars selling each month, and 10s of rows per sale (comments, messages, etc)). He has run this system in one SQL Server database in one physical server with like 20GB or RAM and a couple processors for the whole time, with no problems. Is this some sort of miracle?

Just like most programmers, I'm no DBA and just get by, thanks to ORMs, etc. Everywhere I look, people talk about having the need to shard or get a separate database server for big users of a web app. Why is this? Is it really that inefficient to have a large DB with lots or rows? Should I plan to use Cassandra or something, or can I rely on scaling up well with Postgres?

orokusaki
  • 55,146
  • 59
  • 179
  • 257
  • 7
    Too big is when trees are getting cut down or old buildings demolished to make room for servers. – BoltClock Sep 10 '10 at 23:20
  • Why do most programmers need DBAs? Don't people learn relational database stuff anymore? Anyway, the deal with sharding and so forth has to do scaling performance when you have 10's of thousands or even millions of users not necessarily the size of the database. – BobbyShaftoe Sep 11 '10 at 18:30
  • 1
    @BobbyShaftoe - The thing about programmers needing DBAs has to do with where programmers came from. Programmers didn't used to be software architects, or logicians. They were machine coders and system admins, as well as DBAs; computer scientists, if you will. With the onslaught of high-level programming languages (e.g. Python, Ruby, et al) new programmers emerged; ones that didn't care about binary, or motherboards, or really computer science at all. I take an interest in it myself, coming not from a computer science background, but I just don't have enough time in the day to learn it all. – orokusaki Sep 11 '10 at 20:04

6 Answers6

9

I personally don't think what you've described is that large of a database. The server (20 gigs of ram? ;)) sounds decent. It's more about usage and design. If the database is indexed and well designed, it can grow much, much larger on the current hardware.

Before doing any sort of switch, I'd simply look at archiving useless data and optimising queries if there's a fear of performance issues.

Jemes
  • 2,822
  • 21
  • 22
  • 1
    I don't think it's anywhere near large. In terms of efficiency, decide on a measure or measures and do some sizing, it can be fun. The log might need truncating if it's been running for 5 years! – MikeAinOz Sep 10 '10 at 23:47
6

The reason for sharding and separate db servers is that at some point it's going to be cheaper to use multiple cheaper machines than one expensive one. Hardware price doesn't scale linearly with performance and once you reach a certain point it'll be much cheaper to get twice as many machines as to get a machine that's twice as fast.

Davy8
  • 30,868
  • 25
  • 115
  • 173
  • Very interesting consideration - can you give at least a very rough example in pricing-performance ratio? Even an outdated one would be good, I'm just interested, how does it look like in practice. – Zoltán Schmidt Feb 13 '16 at 06:37
3

You should have no problem in SQL server, Oracle, or any modern relational or non-relational database. I have administered databases with 100's of millions of records and Terabytes of data.

David
  • 34,223
  • 3
  • 62
  • 80
Dustin Laine
  • 37,935
  • 10
  • 86
  • 125
3

Typically you split components up across different servers so you can manage up time, resilience, and performance more easily.

It's certainly quite possible to have one monster machine which does it all, but then you may need another monster machine in case your motherboard dies, or your datacenter is unavailable.

By splitting a web site or application up, amongst different server's it's easier to get cheaper machines, and more of them. Thus you can build in resilience, and not have components which have similiar demands on hardware clashing.

It's also important to think about restore times for servers, and recovery plans.
What happens when your machine dies, can you replace it in the agreed upon time? Can you restore from backups in that time?

SQL Server or other enterprise class databases shouldn't have any problems with 10's or 100GB databases, as long as they not designed too badly. (We have a few machines with that capacity/use which aren't struggling at all.).

Bravax
  • 10,453
  • 7
  • 40
  • 68
2

In my mind that's nothing. Having tens of millions of rows on multiple tables with database size exceeding 10 GB has not caused problems for MS SQL Server. Of course it is not too fast with that much data, but otherwise it works just fine.

And to answer the question, too big is so big it does cause problems. And when it starts causing problems depends on the table structure and your performance demands.

Carlos
  • 2,503
  • 26
  • 29
2

Databases are extremely efficient at storing and retrieving relational data (i.e. data that is structured and has references to other data) - that's what they're designed to do. Honestly, 99% of the people spewing about key-value stores and Cassandra and whatnot have no clue what they're doing. A database server is just fine for storing large volumes of data, particularly if you're willing to put a bit of work into tuning it properly.

That said, there are use cases for Cassandra et. al. - if you have mostly unstructured key/value data or don't need consistency or want to shard for redundancy, it may be worth investigating.

Unless you're an extremely popular website, you probably can get by just fine with a decent database server - don't switch until you've determined why you need to switch. Switching is fine, just make sure you are switching because it serves your needs better, and not because it's the "cool web-scale thing to do"

Steven Schlansker
  • 37,580
  • 14
  • 81
  • 100
  • I meant to ask you way back when you answered this: What are some of the rudimentary obvious steps in tuning a DB (aside from tuning your queries and avoiding extraneous queries, which is about all I currently know how to do)? – orokusaki May 22 '11 at 03:46