5

It is interesting what software are using companies like dropbox, rapidshare, hotfile, and other to manage huge amount of files? Is there any open source system. I took a look on gridFS which is build on mongo-db. It looks like this is not the best choice because of speed (comparing to nginx). Or am i wrong?

I want system which can scale infinitely. By plugging servers in to the system. At list to 100 TB.

Pol
  • 24,517
  • 28
  • 74
  • 95

6 Answers6

3

Checkout MogileFS - http://danga.com/mogilefs/ - an open source filesystem developed by Danga Interactive for using with their LiveJournal.com services.

If you don't want to/cannot use cloud services like Amazon's S3 and run your own servers, than MogileFS might be the right choice. MogileFS is reliable, management and scaling is very easy and cheap.

Unfortunately, I cannot provide any performance comparison or benchmarks to other filesystems. But you shouldn't expect MogileFS for the 1st place, cause it's working on application level, which can also be an advantage.

See also the Google Code page for more infos: http://code.google.com/p/mogilefs/

2

GlusterFs is an open source distributed file system. Unlike HDFS it doesn't have a centralized metadata. Which means GlusterFS has no single point of failure.

Wilk
  • 7,873
  • 9
  • 46
  • 70
Lawcen
  • 21
  • 3
2

Dropbox is built on Amazon's S3 Dropbox - Where are my files stored.

You can find some open source options over here: Alternatives to Amazon S3

Community
  • 1
  • 1
James Avery
  • 3,062
  • 1
  • 20
  • 26
2

Hadoop's HDFS is a scalable file system. Another option is GlusterFS

Arnon Rotem-Gal-Oz
  • 25,469
  • 3
  • 45
  • 68
0

For the googler out there who finds this question:

FB has so many files that they had to write their own file system; multiple files are basically stored into 1 giant file with markers between them. This is done in order to reduce the number of files in the system.

What you need seems like Big Table's Google File System

Adrian
  • 5,603
  • 8
  • 53
  • 85
0

What do you mean by gridFS has capacity limits ? Can you please be more specific or point to some documentation where you read that since I am not aware of any capacity limits with gridFS.

Sid
  • 954
  • 6
  • 7
  • + gridFS has some speed limits: http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/ – Pol Feb 22 '12 at 18:22
  • Again, I don't see any reference to gridFS capacity (scale infinitely as you asked) limits. The first link refers to system (hardware) capacity running mongo and the second one is about gridFS speed. I thought your question was about scalability which I still don't see any. – Sid Feb 22 '12 at 22:32