20

Well we all know how many arguments and lives have been lost with the discussion of using databases for file storage (images specifically). I'm in a bit of a pickle when it comes to deciding on how to proceed with my project.

I have a website that allows admins to upload employee pictures. For now, these pictures are stored in BLOB in my MySQL database. Also, I have a windows application that runs alongside the website. This application enables employees to punch in and have their pictures appear when they've successfully done so. The picture is retrieved via a mysql query within the application (from a non-local remote location) that converts the image content to a readable image that's being outputted in a picture box, confirming the identity of the employee.

In my eyes, it is much much easier to have the images stored in the database and have them retrieved via a simple query. I found this a lot easier than storing image paths in the database and having to deal with the application downloading the images. I also don't have to deal with collisions, folder organization and security and paths being re-written for x,y reasons, etc etc.

The images stored in the DB are a mere 20 kb after being cropped to a certain size. My question is, is it still worth embedding the database with image paths or should they simply be stored as they are right now? If storing images in the database is still ill-advised in this case, is there a formal way to store image paths?

Any help on this would be greatly appreciated. If this question doesn't belong here, I'll be happy to move it.

Dimitri
  • 1,906
  • 3
  • 25
  • 49
  • What's the maximum number of users that you are anticipating? – Stewart Aug 08 '14 at 06:02
  • 1
    Well I highly doubt this project will cater to any company holding more than 100-500 employees. These images are employee profiles. – Dimitri Aug 08 '14 at 06:07

2 Answers2

16

If the images are user data, rather than part of your application's code or theme, then storing the images in the database is a good idea, because…

  • Backups are easier to manage if all you have to back up is the database. On the other hand, if you store some application data in the database and some in the filesystem, then you'll have to coordinate the backup schedules of your database and your filesystem to ensure that the two are consistent.

    If you have a database administrator at your disposal, then great! Your backups should already be taken care of. If not, then database backups may be slightly tricky to set up, but once you do have a backup system, it can be better than filesystem backups. For example, many database systems have support for streaming replication.

  • If your application is load-balanced and served by a pool of multiple webservers, then you'll either have to replicate the data to all of the machines, or share them among your servers using a network filesystem.

Of course, having the images on a filesystem also has its advantages, namely in performance and simplicity, since most webservers are built to serve static files. A hybrid approach could give you the best of both worlds:

  • The images stored in the database would be the authoritative data.
  • Your application can have a feature to extract them as files in their local filesystem as a kind of cache. That cache can be rebuilt at any time, since it is not authoritative.
  • The webserver can then serve the files directly from the filesystem.
Alex Trebek
  • 897
  • 6
  • 13
  • The images are strictly employee data and have no relation to the application. The one problem I'm facing is the fact that the windows application is on a remote location. The website might have an arbitrary host name: e.g www.myapp.com (chosen by the company). This would mean I would need to store this in the database, store the folder paths of the images, concatenate the host name and image folder to create the physical url of the image for the windows app to download. If the host name changes, the database much also change concurrently. – Dimitri Aug 08 '14 at 12:54
  • On the contrary, if you store the image data instead of the filenames, you are free to invent arbitrary filenames. The process of rebuilding the filesystem cache will make things consistent again. If you store the filenames, and the files get renamed in the filesystem but not in the database, you're screwed. – Alex Trebek Aug 10 '14 at 23:26
  • Absolutely, storing image data would give the advantage of naming my files whatever I want and it is what has drawn me into using databases and never have to worry about admin accidentally deleting files or tampering with anything else that might be accessible. – Dimitri Aug 11 '14 at 00:19
9

There are several reasons why I think storing images in a database is a bad idea:

1) The server will have timestamp info associated with files that the database won't keep track of. if you ever need to this for forensics reasons, the DB solution will likely be limited in this regard. Feel free to save info about images uploaded regarding IP info, timestamp, etc. in the DB though too.

2) If you ever want these files used by, say, another system/service you'll have to constantly reference the database and interact with it, when you could far more easily just target a specific folder.

3) Any time an image needs to be retrieved, you have to open a connection to the database just to generate it. This may add extra code and steps to things that could be easier to implement by pointing to a folder.

To avoid naming collisions, if I were on a Linux box, I'd use something like a Unix timestamp as a prefix to the filename when it's saved, or simply use that (+ maybe a short random #) as the image ID altogether. So instead of 'jane-image.jpg', it'd be '1407369600_img3547.jpg'. Then, just put a reference to that in the DB and viola, that's a random enough ID where there should never be a collision, unless time starts flowing backwards. Whatever the Windows timestamp equivalent is would be used, obviously.

NOTE: What you're doing now isn't bad and from what it sounds like it may work best for you...but generally speaking I try not to put everything in the hands of a database, just because I can. But that's me :)

  • 2
    I once had a CTO who insisted that it was possible to embed malicious code into an image, which could get executed as the database processes and streams the bytes of the image. I have no idea if this is true or not. Sounds unlikely to me, but he swore that, due to a bug, Oracle had had a specific problem with it at one time. – Stewart Aug 08 '14 at 06:30
  • 1
    It's true - you can put all sorts of stuff in things like JPGs, but you have to simply make sure folder/file permissions are correct on uploaded stuff. Plus, if you're really worried about bad files hitting your server, you should probably use a conversion tool or something to discard the original and only use the safe, converted file you want. No matter how you slice it, handling user uploaded files is not fun. –  Aug 08 '14 at 06:37
  • Ok. So is that also a reason to prefer the file system to a database then? – Stewart Aug 08 '14 at 06:41
  • 1
    It's a preference thing. I'm a web designer/developer, so for someone like me, I love being able to FTP into a server, swap out files I know are referenced by different parts of the system and then be on my way. If I had to deal with a DB every time I wanted to retrieve or swap out files, I'd blow my brains out. That said, there's a case to be made for storing large amounts of images in a setup where you want to backup, say, a database as opposed to 10,000 separate images. Or, you don't want direct linking to images from a drive-by site visitor. Cases to be made both ways. –  Aug 08 '14 at 06:46
  • I'll agree on some issues. As for number 3, the connection is already established when the employee punches in and all of his/her information is retrieved from the database (to be displayed). Having the query choose one more column where the image is contained isn't a big deal in this case. I just need to figure out how I'm going to deal with the fact that the website is in one location with its own host name, while the windows app is on a remote location... – Dimitri Aug 08 '14 at 12:59