7

Recently, I and my colleagues, we are discussing how to build a huge storage systems which could store billions a pictures which could searched and download quickly.

Something like a fickr, but not for an online gallery. Which means, most of these picture will never be download.

My colleages suggest that we should save all these files in database directly. I really feels that it's not a good idea and I think database is not desgined for restore huge number of binary files. But I have very strong reason for why that's not a good ideas.

What do you think about it.

Joel B Fant
  • 24,406
  • 4
  • 66
  • 67
faceclean
  • 3,781
  • 8
  • 39
  • 58
  • 11
    This has been discussed to death already: http://stackoverflow.com/questions/3748/storing-images-in-db-yea-or-nay http://stackoverflow.com/questions/815626/to-do-or-not-to-do-store-images-in-a-database http://stackoverflow.com/questions/805519/save-image-in-database – DJ. Jul 31 '09 at 14:58

4 Answers4

18

When dealing with binary objects, follow a document centric approach for architecture, and not store documents like pdf's and images in the database, you will eventually have to refactor it out when you start seeing all kinds of performance issues with your database. Just store the file on the file system and have the path inside a table of your databse. There is also a physical limitation on the size of the data type that you will use to serialize and save it in the database. Just store it on the file system and access it.

Srikar Doddi
  • 15,499
  • 15
  • 65
  • 106
  • 8
    Interestingly SQL Server 2008 does this for you with the FILESTREAM storage option- http://msdn.microsoft.com/en-us/library/cc949109.aspx – RichardOD Jul 31 '09 at 15:03
  • Although I agree with this, doesn't SharePoint store almost everything in the database? If so, I'd think that the SharePoint people might not think it is a bad idea to store files in the database. I believe it is beneficial in some ways (like querying), but those ways probably don't fully counteract the things you've mentioned here. – Dusty Jul 31 '09 at 15:13
  • @RichardOD, I read the paper and it mainly talks about the same challenges of storing structured content vs. unstructured content and recommends NTFS. "FILESTREAM is a new feature in the SQL Server 2008 release. It allows structured data to be stored in the database and associated unstructured (i.e., BLOB) data to be stored directly in the NTFS file system. You can then access the BLOB data through the high-performance Win32® streaming APIs, rather than having to pay the performance penalty of accessing BLOB data through SQL Server." – Srikar Doddi Jul 31 '09 at 15:29
2

If you are really talking about billions of images, I would store them in the file system because retrieval will be faster than serializing and de-seralizing the images

Dickson Xavier
  • 99
  • 1
  • 1
  • 11
andrewWinn
  • 1,786
  • 2
  • 14
  • 28
1

The answers above appear to assume the database is an RDBMS. If your database is a document-oriented database with support for binary documents of the size you expect, then it may be perfectly wise to store them in the database.

Eric Bloch
  • 2,882
  • 2
  • 20
  • 26
  • Could you name a couple of such databases? – Moonwalker Aug 08 '12 at 13:18
  • 2
    MarkLogic (http://developer.marklogic.com/) supports storing XML, JSON, text, and binary documents. There is a REST API to get you going quickly at http://github.com/marklogic/Corona as well as a native query language (XQuery). – Eric Bloch Aug 13 '12 at 12:10
0

It's not a good idea. The point of a database is that you can quickly resolve complex queries to retrieve textual data. While binary data can be stored in a database, it can slow transactions. This is especially true when the database is on a separate server from the running application. In the database, store meta-data and the location/filename of the images. Images themselves should be on static server(s).

Corey D
  • 4,689
  • 4
  • 25
  • 33