Now, I have read these questions which may have a relation with this question: Scalable Image Storage, Large scale image storage, https://serverfault.com/q/95444.
The following things i have found out, before i ask my question:
1. Facebook uses Haystack (something CLOSED-SOURCE to the open-source world)
which is very efficient. Its a form of File system storage, engineered for speed
and large metadata management.
2. Any Operating System has a file limit in directories and may start to perform
extremely poorly when this limit is being exceeded.
3. Most NoSQL developers, have found it easy to use CouchDB / CouchBase Server
to handle images as it handles it as an attachment, glued to a document (record
in the database). However, still, this is file system storage.
4. HDFS, NFS, ZFS, are all File systems that may make it easy to handle large
distributed data. However, at applications like facebook, they could not help
5. Any proper form of caching is very essential to highly Image dependent
applications
6. Some PHP developers (mostly) have used MySQL to keep image meta-data while
creating folders and sub-folders (matching the meta-info) on the file system.
Each image will have a random hash name in relation to the meta-data in the
database to enable fast location on the file system
After understanding these statements and many more others, i have come to realise that its very expensive to keep billions of constantly growing number of Images on the file system. If any one were to use Cloud storage like Amazon S3
, it would kill the business because of the high image traffic as well as storage from your application.
I have evaluated the use of CouchBase Server, managing images as attachments. However, for an image growing application, this is also a file system storage and i wonder how Couch base would behave if, hundreds/thousands of people are accessing images at the same time. I could use Cloudant/Big Couch which has auto-sharding/load balancing. The main point remains that the NoSQL solution would as well be keeping images on the file system and when the images are being requested for at a high concurrent rate, this might bring the whole service down (images can be heavy).
My Thinking
I am thinking of managing my images as SVG
format. This is because, i think that i can treat this SVG data as text in my storage. Now, most NoSQL databases have a size limit on the document (record) size atleast not greater than 4MB (not sure). This presents a problem, because SVG file can even reach 6-10MB depending on the image. So, i think i cannot use Couch base server for SVG storage. Besides, the nature of the application is such that, the image data keeps growing and never archived/ never removed: and couch base is not good for such data (highly persistent and unchanging data).
This brings me back to RDBMS (especially Oracle) which are known for good text compression. If i get SVG data plus its meta data and store it as a BLOB
in an Oracle Database, i have a feeling that this could work. I have heard that an Oracle Table can even grow to terabytes, probably with partitioning or some-kind of fragmentation. But the whole point is that, for an oracle table to reach 20GB, containing text, i think this would be a lot of data.
Now, my questions arise from all the above findings:
1. Why do developers keep choosing File System storage of images as opposed to SVG, which in my (probably naive) thinking, is that SVG can be handled as Text, hence can be compressed, encrypted, digested, split, easilly stored e.t.c. ?
2. What complexities are there when an application works with images entirely as SVG, serving SVG to browsers instead of actually image files ?
3. Which is technically more memory disturbing to a Webserver: Serving images read from file system (.png, .jpg, .gif) and serving images as SVG (probably from a Database, or from a middle tier) especially under heavy loads, an example scenario of Facebook ?
4. SVG seems to not loose quality when rendered under different "Zooms" or Resolutions, why still, haven't developers worked with SVG alot in image dynamic applications ? i mean, is there any known loss of quality in converting from PNG, JPG or GIF to SVG
?
5. Is my view of using RDBMS like Oracle/MySQL Cluster very naive, for storing highly persistent meta-data as well as the persistent SVG data ?
Please highlight, and give your suggestions about large image storage formats. Thanks
EDIT / UPDATE
There are tools like Image Magick which offer command line option for manipulating images. The most important idea i need probably is this: Can CouchBase Server (whether single server
or version 2.0
capable of serving Images at "user-experience acceptable performance" or at a "Social Network Scale" ?)