9

What kind of an architecture is needed to store 100 TB data and query it with aggregation? How many nodes? Disk size per node? What can the best practice be?

Every day 240GB will be written but the size will remain same because the same amount data will be deleted.

Or any different thoughts about storing the data and fast group queries?

Community
  • 1
  • 1
canseverayberk
  • 101
  • 2
  • 5

2 Answers2

3

Kindly refer to related question,

MongoDB limit storage size?

Quoting from the the top answer:

The "production deployments" page on MongoDB's site may be of interest to you. Lots of presentations listed with infrastructure information. For example:

http://blog.wordnik.com/12-months-with-mongodb says they're storing 3 TB per node.

Community
  • 1
  • 1
Samuel Liew
  • 76,741
  • 107
  • 159
  • 260
3

I highly recommend HBase.

Facebook uses it for its Messages service, which in Nov 2010 was handling 15 billion messages a day.

We tested MongoDB for a large data set but ended up going with HBase and have been happily using it for months now.

Suman
  • 9,221
  • 5
  • 49
  • 62
  • 1
    how did you handle infrastructure management? We're a small startup and don't have resources yet to do it at 100% – noli Jun 03 '14 at 20:23
  • 2
    Sorry, maybe I'm not understanding - what do you mean by infrastructure management? You mean managing the Hadoop/HBase cluster? We used Amazon Elastic MapReduce. – Suman Jun 03 '14 at 20:49