87

I develop a new website and I want to use GridFS as storage for all user uploads, because it offers a lot of advantages compared to a normal filesystem storage.

Benchmarks with GridFS served by nginx indicate, that it's not as fast as a normal filesystem served by nginx.

Benchmark with nginx

Is anyone out there, who uses GridFS already in a production environment, or would use it for a new project?

Railsmechanic
  • 1,045
  • 1
  • 9
  • 13
  • 1
    A blog post on storing images in mongodb for future searchers who had a similar intent to me: http://menge.io/2015/03/24/storing-small-images-in-mongodb/ (compares GridFS with simply throwing it into the doc as binary data) –  Mar 30 '17 at 14:52
  • There are a lot of trade-offs to consider when deciding if you want to store binary data in MongoDB - see: https://alexmarquardt.com/2017/03/02/trade-offs-to-consider-when-storing-binary-data-in-mongodb/ – Alexander Marquardt Jun 07 '18 at 07:07

5 Answers5

121

I use gridfs at work on one of our servers which is part of a price-comparing website with honorable traffic stats (arround 25k visitors per day). The server hasn't much ram, 2gigs, and even the cpu isn't really fast (Core 2 duo 1.8Ghz) but the server has plenty storage space : 10Tb (sata) in raid 0 configuration. The job the server is doing is very simple:

Each product on our price-comparer has an image (there are around 10 million products according to our product db), and the servers job is to download the image, resize it, store it on gridfs, and deliver it to the visitors browser... if it's not present in the grid... or... deliver it to the visitors browser if it's already stored in the grid. So, this could be called as a 'traditional cdn schema'.

We have stored and processed 4 million images on this server since it's up and running. The resize and store stuff is done by a simple php script... but for sure, a python script, or something like java could be faster.

Current data size : 11.23g

Current storage size : 12.5g

Indices : 5

Index size : 849.65m

About the reliability : This is very reliable. The server doesn't load, the index size is ok, queries are fast

About the speed : For sure, is it not fast as local file storage, maybe 10% slower, but fast enough to be used in realtime even when the image needs to be processed, which is in our case, very php dependant. Maintenance and development times have also been reduced: it became so simple to delete a single or multiple images : just query the db with a simple delete command. Another interesting thing : when we rebooted our old server, with local file storage (so million of files in thousands of folders), it sometimes hangs for hours cause the system was performing a file integrity check (this really took hours...). We do not have this problem any more with gridfs, our images are now stored in big mongodb chunks (2gb files)

So... on my mind... Yes, gridfs is fast and reliable enough to be used for production.

Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192
Manu Eidenberger
  • 2,076
  • 1
  • 18
  • 23
  • 9
    I am shocked that anyone would use raid 0 as there primary storage on a production web site. Even with good backups, increasing the probability of a storage failure is a pretty steep price to pay for improved performance. – mikerobi May 12 '11 at 01:17
  • 68
    We use raid 0 because in our particular case, image data can be volatile. It doesn't matter if the image is lost since we will download it again from the merchants website. Pragmatically, we could consider that our server is a simple image cache server. – Manu Eidenberger May 12 '11 at 21:17
  • But you're actively increasing the chance of failure (initial drive failure factor multiplied by the number of spindles). Raid 10 would be ideal if you need more writes than reads or Raid 5/6 if you need more reads than writes. – NeuroScr Apr 25 '14 at 01:36
  • 9
    @ManuEidenberger Why are you using GridFS for storing images which would rather be stored in a MongoDB document? I guess you did not reach the 16 MB document size limit. And storing the image as BLOB within a MongoDB document would be more efficient, since you do not need the GridFS layer on top of MongoDB documents. – Arnaud Bouchez May 13 '15 at 14:27
  • 1
    I'm also curious about @ArnaudBouchez's question. Was there some benefit that made you choose GridFS over simply storing it as binary data in a document, Manu? Thanks! –  Mar 30 '17 at 14:32
12

As mentioned, it might not be as fast as an ordinary filesystem but then it gives you man advantages over ordinary filesystems which I think are worth giving up a bit speed for.

Ultimately, with sharding, you might reach a point however where the GridFS storage actually becomes the faster option as opposed to an ordinary filesystem and a single node.

evdama
  • 2,166
  • 1
  • 16
  • 17
6

Heads-up on repairs for larger DBs though - a new system we're developing, mongo didn't cleanly exit, and repairing the 7TB GridFS looks like it will take 130 hrs.

Because of this, I think I'll look at switching to OpenStack Swift or Ceph. Still, until then it was good. And the nginx-gridfs module is sweet.

Nick
  • 376
  • 3
  • 8
5

mdirolf's nginx-gridfs module is great and fairly easy to get setup. We're using it in production at paint.ly to serve all of the paintings and there have been no problems so far.

schallis
  • 63
  • 1
  • 4
2

I don't recommend using gridfs unless you know what you are doing. GridFS is just abstraction layer which splits files for chunks and stores the files in two collections. More files - more overhead. If you expect files be pretty the same size, not exceeding 32M or so - you are in the right way. Do not try to store large files on gridfs. Why?

  1. Drivers on different languages may read the whole file.(e.g. chunks) when reading the little part of the file.
  2. Modifying the file may affect all chunks and increase database load If your file system is growing up, you will have to decide to shard the gridfs. Be careful! Consistence is not guaranteed when sharding is initializing!

If you think about read loaded project - consider loading the files into docs directly (if 16M or less size) or choose another clusterfs, and link filename/inode to your logic.

Hope this helps.

Vitaly Greck
  • 658
  • 5
  • 9
  • 4
    I'm fairly new to GridFS though from what I understand GridFS is more than just an abstraction layer that doubles the number of files. GridFS provides a simple way of taking advantage of MongoDB's replication and sharding features. I believe others have also mentioned that files are stored in 2GB chunks which I imagine would reduce the total number of files, especially if someone has a very large amount of small images. –  Nov 28 '13 at 04:46
  • +1 You are right. Even smaller files would not benefit to be stored with GridFS. If your file could be stored in a MongoDB document (i.e. < of its 16 MB size limit), you would rather store the file as a BLOB within a MongoDB document. It will by-pass the overhead of using GridFS on top of MongoDB storage. See https://www.compose.io/articles/gridfs-and-mongodb-pros-and-cons/ – Arnaud Bouchez May 13 '15 at 14:29