0

As the title says, what is the prefered way of saving an uploaded file in a Java EE web application? I read some answers on other questions that saving the file to the filesystem of the servlet container is not recommended without further explanation. Some say you should save it to a database (but I doubt that from what I have read earlier) and some say that you should use JCR where the only implementation I can find is Apache JackRabbit, which doesn't seem to be very active?

What would be the best option? Are there other than those mentioned? Reasons why you would choose one over the other is appriciated.

LuckyLuke
  • 47,771
  • 85
  • 270
  • 434

3 Answers3

3

Depending on your environment you'll probably want to do one of a few things:

  1. Your server is in the cloud. You'll want to use a shared cloud store service such as Amazon S3 (which has a nice API btw)

  2. You are hosted on a traditional server. In this case the best practice would be to use a shared NAS, but cloud storage is also an option unless your client has regulatory concerns

  3. You are primarily dealing with many small(er) files and you want them to be searchable. For this scenario you'd choose a BLOB database column.

  4. If you're handling large files (like video) you'll probably want to look into NAS/cloud storage instead and use the database as just a reference to the NAS/S3 location

The reason for these options is because you don't want to sandbox your data to a running instance. This architecture allows for either additional instances of your application to be brought online or for a simple server migration and still have access to the shared data.

Erich
  • 2,743
  • 1
  • 24
  • 28
1

JCR, as you've already seen, isnt all that popular. using the filesystem is not a very good idea both from a platform perspective (windows, for example has limits on max file path length, constraints on legal file names, and issues with >~100K files in a directory before it slows down to a crawl) and an architecture perspective - think about clustering your application:

if you use any form of local storage you wont be able to cluster easily (as not all files are easily accessible tfrom all nodes), so you need to choose something accessible from all cluster nodes. DB is a good fit for that. some sort of cluster cache (or hadoop) might also be a good fit, depending on the specifics of your problem.

radai
  • 23,949
  • 10
  • 71
  • 115
  • But should files be stored in databases? – LuckyLuke Apr 25 '13 at 18:58
  • most DBs have very decent blob storage capabilities that can scale up to multiple TBs of data. unless youre doing something really drastic i dont see any reason why not – radai Apr 25 '13 at 19:02
  • Right, maybe I should reconsider database storage then. So you won't get problems with some thousands of jpg/png images of size 40 KB to a couple of MB? And it is not a bad alternative to file system storage? – LuckyLuke Apr 25 '13 at 19:09
  • Storing millions of images is actually one of the scenarios that MS Sql Server was designed against. I'd say most engines will handle it just fine. Every CMS I've met saves images in a database. – Jesan Fafon Apr 25 '13 at 19:11
  • absolutely no issues for several thousands of files. the DB might be bigger than the size of the files due to BLOB storage overhead (see http://stackoverflow.com/questions/4659441/mysql-blob-vs-file-for-storing-small-png-images) but nothing significant in absolute terms. also, the large the files, the less the overhead is felt – radai Apr 25 '13 at 19:12
0

In my opinion this message this question depends on what you want to save. Big files like HD video is are much faster accessed via filesystem. Using a database on the other hand makes it easier because you don't have to know the file are actually saved.

Small amount and small files > database Otherwise filesystem

Another pro for using filesystem as storage is the ability to implement a full text search framework like apache luscene.

Lukas Eichler
  • 5,689
  • 1
  • 24
  • 43
  • 1
    lucene stores its index separately from the actual files. you could have a lucene index for files stored in a DB as well. – radai Apr 25 '13 at 19:03