1

Possible Duplicate:
When to store images in a database(mySQL) and when not? (Linking image to item in database)
Which is the best method to store files on the server (in database or storing the location alone)?

What should I consider to best choose between storing MP3s/pdfs with a database. Is it better to store the file as a BLOB in the DB? or is it better to just have a field that allows me to look at the filesystem?

The files in questions are not likely to exceed 15MiB, but all likely to be at least 1MiB, in case that's a consideration.

The data will be accessed via PHP, if that makes a difference...

Edit: I'll go with what I was already doing :) resources on the filesystem with filenames in the database. Thanks or the input!

Community
  • 1
  • 1
dainichi
  • 103
  • 1
  • 10
  • 4
    consider how nice is to copy/move/backup a file in filesystem. consider painful database backups with files in DB. etc – Peter Jan 16 '13 at 21:56
  • 5
    This has been asked a million times. There are good reasons for both solutions. For small/medium files, the DB might actually be faster. At least one DBMS is: http://research.microsoft.com/apps/pubs/default.aspx?id=64525 (I wouldn't be surprised if one couldn't find similar thresholds for other DBMS like Postgres, Oracle or MySQL) –  Jan 16 '13 at 21:56
  • 1
    or here: http://stackoverflow.com/questions/8407372/when-to-store-images-in-a-databasemysql-and-when-not-linking-image-to-item-i/8407827#8407827 –  Jan 16 '13 at 22:00
  • To be fair, I did look for duplicates. I guess my stackoverflow-fu is not quite as good as my google-fu! I wish I could give point sot both answers, but as Mathew Foscarini had a more indepth and broad consideration, I feel I must select his answer. Much thanks to Mike Brant for additional information to consider! – dainichi Jan 18 '13 at 00:46

3 Answers3

5

I've been working on a project for the last 3 years where large attachments were stored in the database.

Advantages

  • Managing records in a database is easier and less code than having to handle associated files.
  • Even with CakePHP's behaviors to automate the task. The database is easier.
  • Some projects have security requirements to keep data in the database (even if it doesn't really improve security).
  • Automated build processes can easily backup the data, because it's already in the database anyway.

Disadvantages

  • Taking snapshots of the database for development is a pain.
  • After you pass several GBs in size. No matter what people tell you. You'll start to sweat wondering if the DB engine can handle it.
  • Dumping the database starts to take very long.
  • Data migration to a different schema starts to require non-SQL solutions.

Myths About DB Blobs

  • That your database can't handle a large database file size.
  • That one bad sector on the hard drive means you could lose all your blob data.
  • That there is a performance cost to storing in the database.
  • That it's more secure.
  • That reading from the DB and sending to the client is slower than reading a file and sending it.

All the above (from my experience) is false.

Facts About Large Blobs

  • There will come a time when you no longer want it in the database.
  • GUID record identifiers are your friend. Don't use the record (int)ID in the file name.
  • You don't need to reverse find a record from a file name, but you need to find a file from a record. So you can just use the guid as the file name.
  • It's easier to backup/copy a file system and a small database, then just a large database.
  • Don't store the files in your webroot.
Community
  • 1
  • 1
Reactgular
  • 52,335
  • 19
  • 158
  • 208
  • can I get a +1 before this Q is closed. lol – Reactgular Jan 16 '13 at 22:10
  • 1
    haha! I think this answer definitely deserves a +1 – VictorKilo Jan 16 '13 at 22:24
  • Good summary. I'm not sure I buy the problems with the backup though. DBMS can do very efficient differential/incremental backups as well. And when you need to get a full dump of the database, you'd probably want a "dump" of the files as well, and that would take just as long to copy as the database dump (maybe even longer because there are so many files) –  Jan 16 '13 at 22:54
  • I agree. Dumping 1GB is faster then copying 100,000 files that total 1MB, but my production database is remote. So downloading a 20GB dump just to verify a bug is a pain. – Reactgular Jan 16 '13 at 23:05
  • Thank you Mathew Foscarini. This answered my questions. – dainichi Jan 18 '13 at 00:47
4

I think for most use cases, just storing a filepath is the best approach. To me, the advantages of this approach include:

  • For cases of images or other files that could be downloaded by users, you have the ability to let the client directly download/cache the files.
  • Allows for implementation of a CDN if needed at some point in the future (i.e. just update your file references in the database
  • It will greatly minimize the amount of data being passed back and both between the application and the database, potentially eliminating extra bandwidth usage (if DB is on remote server) and making your queries respond much more quickly (thus not tying up your database).
  • It makes DB backups much faster.
  • It allows you to use traditional file management techniques for the assets.

There are some potential use cases for storing blobs:

  • Perhaps you want to to perform some sort of binary search against the assets (though it can be easily argued there are better solutions for this than MySQL).
  • You need to guarantee for some reason that there are no orphaned files, or entries without an actual file (i.e. make commit of image transactional)
Mike Brant
  • 70,514
  • 10
  • 99
  • 103
  • 1
    You should have listed the disadvantages as well: no transactional storage. Requirement to do house-keeping, two different backup and restore plans, requirement to evenly spread the files in directories (because no filesystem can handle millions of files in a single directory) –  Jan 16 '13 at 22:07
  • @a_horse_with_no_name These are valid points. The reality of the matter is that once you get to the point of having millions of files, you really should be using a different storage solution altogether (rather than server-local storage) something more like Amazon S3. – Mike Brant Jan 16 '13 at 22:17
-2

Best bet is to save the file in an upload directoy, and pass that location onto MySQL

Hydra IO
  • 1,537
  • 1
  • 13
  • 28
  • Allow me to explain in hopes of having that nasty -1 removed :p, it is so much easier to manage in a consolidated location then to deal with corrupted or other such issues storing blob – Hydra IO Jan 16 '13 at 21:59
  • Wow your comment changes everything. Still -1 ;) – Peter Jan 16 '13 at 22:02
  • 2
    Stackoverflow, such a harsh mistress. Maybe op can read the 10k other questions related to the same topic and chose for himself. – Hydra IO Jan 16 '13 at 22:04
  • 1
    Well having files in the filesystem does *not* consolidate anything. You actually spread things around. When you have everything in the database, you really consolidate because you only need a single backup and restore strategy to maintain and test. –  Jan 16 '13 at 22:05