4

The java web app I'm developing allows users to upload files (pictures and documents) to their profiles and define access rules for those files (define which of the other users are able to view / download the file). The access control / permission system is custom made and rules are stored in mongoDB alongside the user's profile and the actual file entry.

Knowing that I need the application and storage to be distributed and fault-tolerant I need to figure out which is the best strategy for file storage.

Should I store the files inside mongoDB in the files collection where the file document containing description and access rules are located ?

Or should I store the files inside the server's file system and keep the path in the mongoDB document? With the filesystem approach will I still be able to enforce the user defined access permissions and how? Finally in the filesystem approach how do I distribute files accross servers? Should I use dedicated servers for this or can I store the files on the webapp servers or mongodb servers ?

Thanks a lot for all your insights! Any help or feedback appreciated.

Alex

azpublic
  • 1,404
  • 4
  • 20
  • 42
  • 1
    MongoDB's GridFS gives you "distributed and fault-tolerant". And you already it configured already. You did not mention "performance" which is what these DB-vs-filesystem questions are usually about. Cannot speak to that. FWIW, I am just starting out on a similar thing, and we are trying to put everything in GridFS (with local filesystem caching). Will see how that goes. – Thilo Dec 06 '11 at 09:24
  • 1
    Maybe this helps: http://stackoverflow.com/questions/3413115/is-gridfs-fast-and-reliable-enough-for-production – Thilo Dec 06 '11 at 09:25

1 Answers1

7

There are several alternatives:

  • put files in a storage service (e.g. S3): easy and much space but bad perf
  • put files in a local filesystem: fast but doesnt scale
  • put files in mongodb docs: easy, powerful and scalable but limited to 16MB
  • use GridFS layer of mongodb. Functionalities are limited but it is made for scalability (thanks to sharding) and is fairly fast too. Note you can put info about file (permission etc) right into the file's metadata object.

In your case it sounds like last option may be best, there are quite a few users who switched from FS to gridFS and it worked very well for them. Things to keep in mind:

  • gridfs sharding works but is not perfect: usually only data is sharded, not the metadata. Not a big deal but the shard with metadata must be very safe.
  • it can be beneficial to use gridfs in a separate mongodb cluster from your core data, since requirements (storage, backup, etc) are usually different.
ajg
  • 369
  • 1
  • 1