11

Possible Duplicate:
storing uploaded photos and documents - filesystem vs database blob

I am starting to develop a web app, the primary purpose of which is to display photos. The users will be able to upload photos as well.

The first question that came up was where to store the photos: on the file system or the database.

I will be using a Windows box to host the site. The database is MySQL and the backend code is in C# utilizing ASP.NET MVC.

Community
  • 1
  • 1
AngryHacker
  • 59,598
  • 102
  • 325
  • 594

10 Answers10

31

Filesystem, of course, unless you're aiming for a story on thedailywtf. The easiest way is to have the photos organized by a property you can derive from the file itself, such as its SHA-1 hash. Then just store the hash in the database, attached to the photo's primary key and other attributes (who uploaded it, upload date, etc).

It's also a good idea to divvy up the photos on the filesystem, so you don't end up with millions of files in a single directory. So you'll have something like this:

storage/00/e4/f56c0de1c61fdb926e79e8a0a65bd12930c9.jpg
storage/25/9a/ec1c55bfb660548a6770238668c4b117d92f.jpg
storage/5d/d5/4b01d98f17a9ad9dd1526b49ba39b5aa37a1.jpg
storage/63/49/6f740b6c284ce6685dc17d473a7360ace249.jpg
storage/b1/75/066d178188dde110149a8422ab651b0ee615.jpg
storage/b1/20/a2b7d02b7b0c43530677ab06235382a37e20.jpg
storage/da/39/a3ee5e6b4b0d3255bfef95601890afd80709.jpg

This is also easy to port if you ever move to sharded storage.

John Millikin
  • 197,344
  • 39
  • 212
  • 226
  • 1
    The SHA-1 hash idea for creating directories and filenames is brilliant. Answer accepted. – AngryHacker Oct 10 '09 at 05:28
  • 2
    But what about deleting? Example: 2 users uploaded the same file. So only one file will exists because the hash (path) is the same. When one of them delete the photo then second user will lost it also. Am I right? – binball Mar 20 '13 at 12:24
  • @John Millikin, how are you generating and storing the storage directory in the database? – Code Junkie Oct 01 '13 at 01:49
  • 1
    @binball You can keep a simple counter for each hash. If a user uploads a file that matches an existing hash, just increase the count by one. If a user deletes, you decrease the count, and if the count reaches zero, safely delete the file. It's a form of 'Reference Counting' – Bert Jan 22 '19 at 17:19
  • I have similar question about how developers in industry structure the folders & uploaded files , and come across this stackoverflow thread, this answer is great but it has been here for 11 years , so is it still considered as good practice to structure the uploaded files in this way as of 2021 ? thanks – Ham Jun 29 '21 at 15:42
  • 1
    @Han Yes, I've stored files on the file system multiple times over the course of the last 11 years. Works reasonably well. That said, for smaller installations, I've also kept the photos in the DB with no perf degradation. It's handy because you can backup & restore the DB elsewhere and everything is there - you don't have to make separate step to copy the photos. However, it does bloat the database quite a bit. So you have to weigh the pros and cons of your situation. – AngryHacker Jun 29 '21 at 17:15
4

If you're building a website around photos then forget the database. If it will become popular your database is going to be hit hard and the majority of its time will be spent delivering photos. Also databases don't scale very well. There are so much more advantages in keeping them on the file system. And you can scale very well, having static content servers, using services for content delivery.

Also, Amazon S3 or other cloud providers do have their advantages. For instance S3 + Amazon CloudFront will provide good performance. CloudFront caches your files on servers around the world so they'll be very easily/fast accessible from anywhere. BUT if we're talking pictures and the site becomes popular your bills might be quite high.

For S3 Amazon charges per storage and per transfer in/out of the cloud. For CloudFront per transfer.

David Koelle
  • 20,726
  • 23
  • 93
  • 130
Mircea Grelus
  • 2,905
  • 1
  • 20
  • 14
4

If you are using SQL Server 2008 there's a Filestream datatype that handles most of the problems mentioned about the DB getting larger. It handles all the annoying details of synchronizing between the filesystem and the table.

Look here for a blog post about the topic: Store any data in SQL Server 2008 (Katmai)

Siewers
  • 22,626
  • 3
  • 20
  • 31
3

Generally, people store binary data such as images on the filesystem, not the database. They reference the filesystem path from the database. Retrieving BLOBs (binary large objects) from the database is slower than allowing the web server to serve static files from the file system.

Asaph
  • 159,146
  • 25
  • 197
  • 199
3

I would use something like Amazon S3.

But, if the choice is between filesystem and database I would choose filesystem because it's faster to server images from a filesystem than a database.

dtc
  • 10,136
  • 16
  • 78
  • 104
3

The only reason I would put photos as BLOBs in a database would be if I had a cluster of servers, and I was using database replication to automatically copy the photos to every machine in the cluster.

Life is much simpler if you just store the photos as files, and store the filenames of the photos in the database. If you need to create unique filenames for the photos, you can use a primary key integer from the database as part of the filename. But you could also just use a hash of the photo itself, as suggested by John Milliken. That's simple, and simple is better.

steveha
  • 74,789
  • 21
  • 92
  • 117
  • can also be done on the file system. No need to put images in the DB. – Jim Blizard Oct 09 '09 at 23:24
  • 1
    "can also be done"? That's rather terse. Are you saying that automatic replication can be done? I presume so; someone must have written a file replication system. But if you already have database replication set up, it might be simpler to just shove the photos in there, rather than configuring and debugging two separate replication systems. Do you disagree? – steveha Oct 09 '09 at 23:27
3

Some people point out that it's easier to manage if everything's in the database: including making backups, and preserving referential integrity.

ChrisW
  • 54,973
  • 13
  • 116
  • 224
3

If you store it in db, the db will grow quickly and will be much, much larger. It is just a touch more complicated to get an image out of db for display then to it is to get it from a file system. On the other hand, you better make sure that the file names and paths do not get out of sync with what is stored in db. In the past i have chosen to store on disk instead of db. It made it easier for me do move the database to different boxes. Worked out well.

Tengiz
  • 31
  • 1
2

It makes life so easy when you have a blob database. You should forget about the nightmare that is file system management.

EDIT

ID
VARBINARY

From experience this is an efficient way to manage binary files. You have one database that has only binary files. How can this be any harder to backup?

ChaosPandion
  • 77,506
  • 18
  • 119
  • 157
  • 2
    Until you need to back up your database, and, *surprise*, it's got thousands of gigs of binary garbage mixed in with the metadata. – John Millikin Oct 09 '09 at 23:23
  • 1
    Yea, that should be the primary objective of any software architecture...Make "life so easy" for the developer. Forget about the operations people that have to deal with a multi-terabyte database or the user that have to wait for images to get out of a server that was made for storing DATA not images. – Jim Blizard Oct 09 '09 at 23:27
  • You guys have offered not valid reason to not store binary data. This is the same old mantra that has been spewed for years. – ChaosPandion Oct 09 '09 at 23:33
  • ... if you're more concerned with easy development or academic considerations than with practical issues of scalability and some of the IT aspects of maintainablity. – Jim Dennis Oct 09 '09 at 23:34
  • 1
    Please tell me how using the file system is more efficient or scalable? – ChaosPandion Oct 09 '09 at 23:35
  • Sometimes a mantra is, in fact, axiomatic. Sometimes people repeatedly "spew" the truth and sometimes the brave contrarian souls who rail against these truism are flat out wrong. – Jim Dennis Oct 09 '09 at 23:38
  • 1
    A file system IS a database--one that happens to be designed from the outset to store files/documents as opposed to the small, repeated fields relational stores were originally intended for. You CAN make a workable solutions with an RDBMS but you'll find a greater variety of natural and intuitive tools for dealing with files when they're in a file system. – steamer25 Oct 13 '09 at 21:26
2

We had a similar decision to make for a project I am on. The compelling thing about jamming stuff (images and other BLOBy things) into the DB is that it is is less likely that someone might delete/alter something (either intentionally or unintentionally). But, that isn't the choice we made. Instead we have the path info stored in the DB and use that to reference the data via UNC path. Data paths are stored in two parts - a part that references the location of the data relative to which machine it resides on and a part that points to which machine that group of data is on. When we need to move data around we can update the appropriate path info.

It is certainly quick to get the data without pulling out of the DB. Ultimately that was a major deciding factor.

itsmatt
  • 31,265
  • 10
  • 100
  • 164