1

There are several questions with excellent answers on SO regarding the quintessential BLOB vs filesystem question. However, none of them seem to represent my scenario so I'm asking this.

Say there's a social network (a hypothetical scenario, of course) where users are all free to change anyone's profile picture. And each user's profile is stored in a MySQL table with the following schema:

ID [unsigned int, primary]
USERNAME [varchar(20)]
PROFILENAME [varchar(60)]
PROFILEPIC [blob]

Now, here's the thing: What if I want to store profile images as files on the server instead of BLOBs in the db? I can understand there will have to be some kind of naming convention to ensure all files have a unique name that also maps it to the primary key on the table for easy access. So, say, the primary key is the filename for the corresponding image stored on the disk. But in my context there could be simultaneous read/writes, and quite a lot of them. MySQL would typically handle that with no problems since it locks out the row while it's being updated. But how does one handle such situations in a filesystem model?

TheLearner
  • 2,813
  • 5
  • 46
  • 94

1 Answers1

1

In your application layer, you could lock the block that does DB transaction and file IO to alleviate concurrency issues (lock example in C#).

Within this block, run your inserts/updates/deletes in a transaction. Follow that with adding/replacing/deleting the photo on disk. Let's write some pseudo-code:

lock (obj) 
{
    connection.StartTransaction();

    connection.PerformAction();
    if failed, return false;

    photoMgmt.PerformAction();
    if failed, return false;

    connection.CommitTransaction();
}

Applying similar technique with PHP; additionally use flock to perform file locking.

In other words, commit to DB after committing to filesystem. If either DB or filesystem operation fails, perform cleansing so no change is saved.

I'd use bigint ID as the primary key and GUID filenames on disk. If users preferred the application to hold the name they provided, I'd create a field called user_filename to store the filename provided by the user, and for all other purposes I'd use the GUID.

Hopefully this will provide some direction.

zedfoxus
  • 35,121
  • 5
  • 64
  • 63
  • I will look into flock()...being a PHP noob, never heard of it but sounds promising. Some questions though: Any particular reason for preferring BIGINT over INT other than anticipation of eventually having more rows in the table than INT can qualify? They say BIGINT renders queries and indexing slower compared to INT. Another question, GUID. Why would you prefer GUID over simply using the primary key as the filename? Just trying to understand you better. – TheLearner May 11 '15 at 03:25
  • Also, considering each image file is going to be less than 20kB, would you see any merit in just doing away with all the application layer overhead and just storing them as BLOB instead? – TheLearner May 11 '15 at 03:26
  • BIGINT would be beneficial for large number of records - you were right on. INT is 4 bytes and BIGINT is 8 bytes. So technically, there will be a slight speed improvement using INT but not terribly, I don't think. You can run scenarios on a test DB and I believe speed differences will be negligible. I'll write about GUID in a different comment. – zedfoxus May 11 '15 at 13:07
  • You can use ID or a different primary key as the filename. When different teams work on projects, they may do the same thing with a different table and now your app may store 100.jpg and theirs may attempt to store 100.jpg also, causing your file to be clobbered. Good communication and architecture can solve that issue. To be on the safe side, I prefer GUID. When moving files from one app/server/company to another, GUIDs will have low chance of collision. – zedfoxus May 11 '15 at 13:09
  • Images can be certainly stored in MySQL. There is a good discussion about that on [question 3748](http://stackoverflow.com/questions/3748/storing-images-in-db-yea-or-nay) and [question 815626](http://stackoverflow.com/questions/815626/to-do-or-not-to-do-store-images-in-a-database) – zedfoxus May 11 '15 at 13:11
  • Thanks @zfus. Apparently, this scenario has a name! They call it race conditions. Well, I did look up flock() and for what it's worth, it doesn't really solve my problem. For instance, say I get an exclusive lock on an image upon match before proceeding to replace it with the one sent in via $_POST. Now, replacing a file involves deleting the old file and then copying the new one to its location. Problem is, will the lock stay in effect the moment the old file is deleted? I figure not. I looked all over the place but couldn't come up with a PHP implementation of code-locking you mentioned. – TheLearner May 11 '15 at 22:39