2

In my project (similar to mediafire and rapidshare), clients can upload files to the server. I am using DB2 database and IBM WAS web server and JSP as server side scripting. I am creating my own encryption algorithm, as it is the main aim of the project.

I need suggestion whether files themselves should be stored in the database or if only the location of the files should be stored. Which approach is best?

Brett Kail
  • 33,593
  • 2
  • 85
  • 90
Arun prasath
  • 55
  • 1
  • 6
  • 1
    By the way, creating your own encryption algorithm is _hugely_ frowned upon (via this [question/answer](http://security.stackexchange.com/questions/18197/why-shouldnt-we-roll-our-own) on the security site here.). I'm sure there are existing libraries for your situation... besides the fact that DB2 can encrypt column contents automatically. – Clockwork-Muse Jan 08 '13 at 16:55

3 Answers3

3

There are Pros and Cons for storing BLOBs in the database.

Advantages

  • DBMS support for BLOBs is very good nowadays
  • JDBC driver support for BLOBs is very good
  • access to the "documents" can happen inside a transaction. No need to worry about manual cleanup or "housekeeping". If the row is deleted, so is the BLOB data
  • Don't have to worry about filesystem limits. Filesystems are typically not very good at storing million of files in a single directory. You will have to distribute your files across several directories.
  • Everything is backed up together. If you take a database backup you have everything, no need to worry about an additional filesystem backup (but see below)
  • Easily accessible through SQL (no FTP or other tools necessary). That access is already there and under control.
  • Same access controls as for the rest of the data. No need to set up OS user groups to limit access to the BLOB files.

Disadvantages

  • Not accessible from the OS directly (problem if you need to manipulate the files using commandline tools)
  • Cannot be served by e.g. a webserver directly (that could be performance problem)
  • Database backup (and restore) is more complicated (because of size). Incremental backups are usually more efficient in the filesystem
  • DBMS cache considerations
  • Not suited for high-write scenarios

You need to judge for yourself which advantage and which disadvantage is more important for you.

I don't share the wide-spread assumption that storing BLOBs in a database is always a bad idea. It depends - as with many other decisions.

  • 1
    Also, if you need to replicate the environment, you only need to restore a file when includind files in the database. In the other case, the files have to be copied independently. Also, for High Availability it is better to include files in the database. – AngocA Jan 08 '13 at 15:08
  • @AngocA: the part about restoring, is what I meant with "Backup everything together". –  Jan 08 '13 at 15:11
2

It's general knowledge that storing files in the database -especially big ones- it's generally a bad idea. There are brilliant explanations in these questions:

Storing a file in a database as opposed to the file system?

Storing Images in DB - Yea or Nay?

And I'd like to highlight some points myself:

  • Storing files in your DBMS will make your data very big, and big databases are a maintaining hell (specially backups)
  • Portability becomes an issue, as every DBMS vendor makes its own implementation of BLOB files
  • There's a performance lost related to SELECT sentences to BLOB fields, compared to disk access
Community
  • 1
  • 1
Carlos Gavidia-Calderon
  • 7,145
  • 9
  • 34
  • 59
  • Here is a research that shows the opposite (at least for certain filesizes) regarding the performance: http://research.microsoft.com/apps/pubs/default.aspx?id=64525 –  Jan 26 '13 at 16:46
0

Well my Opinion would be to store the relevant information like path, name, description, etc... in the database and keep the file evtl. encrypted on the filesystem, it would be cheaper to scale your system adding a webserver than adding a database one as webspace is cheap comparing with databases, all you will need then is to add an IP column to your database or server name so you can address teh new webserver.

CloudyMarble
  • 36,908
  • 70
  • 97
  • 130