Image Format for Large Storage in relation to Nature of Storage system

Question

Now, I have read these questions which may have a relation with this question: Scalable Image Storage, Large scale image storage, https://serverfault.com/q/95444.

The following things i have found out, before i ask my question:

1. Facebook uses Haystack (something CLOSED-SOURCE to the open-source world) 
which is very efficient. Its a form of File system storage, engineered for speed
 and large metadata management.
2. Any Operating System has a file limit in directories and may start to perform 
extremely poorly when this limit is being exceeded.
3. Most NoSQL developers, have found it easy to use CouchDB / CouchBase Server
 to handle images as it handles it as an attachment, glued to a document (record
 in the database). However, still, this is file system storage.
4. HDFS, NFS, ZFS, are all File systems that may make it easy to handle large
 distributed data. However, at applications like facebook, they could not help
5. Any proper form of caching is very essential to highly Image dependent
 applications
6. Some PHP developers (mostly) have used MySQL to keep image meta-data while
 creating folders and sub-folders (matching the meta-info) on the file system.
 Each image will have a random hash name in relation to the meta-data in the 
database to enable fast location on the file system

After understanding these statements and many more others, i have come to realise that its very expensive to keep billions of constantly growing number of Images on the file system. If any one were to use Cloud storage like Amazon S3, it would kill the business because of the high image traffic as well as storage from your application.

I have evaluated the use of CouchBase Server, managing images as attachments. However, for an image growing application, this is also a file system storage and i wonder how Couch base would behave if, hundreds/thousands of people are accessing images at the same time. I could use Cloudant/Big Couch which has auto-sharding/load balancing. The main point remains that the NoSQL solution would as well be keeping images on the file system and when the images are being requested for at a high concurrent rate, this might bring the whole service down (images can be heavy).

My Thinking

I am thinking of managing my images as SVG format. This is because, i think that i can treat this SVG data as text in my storage. Now, most NoSQL databases have a size limit on the document (record) size atleast not greater than 4MB (not sure). This presents a problem, because SVG file can even reach 6-10MB depending on the image. So, i think i cannot use Couch base server for SVG storage. Besides, the nature of the application is such that, the image data keeps growing and never archived/ never removed: and couch base is not good for such data (highly persistent and unchanging data).

This brings me back to RDBMS (especially Oracle) which are known for good text compression. If i get SVG data plus its meta data and store it as a BLOB in an Oracle Database, i have a feeling that this could work. I have heard that an Oracle Table can even grow to terabytes, probably with partitioning or some-kind of fragmentation. But the whole point is that, for an oracle table to reach 20GB, containing text, i think this would be a lot of data.
Now, my questions arise from all the above findings:

1. Why do developers keep choosing File System storage of images as opposed to SVG, which in my (probably naive) thinking, is that SVG can be handled as Text, hence can be compressed, encrypted, digested, split, easilly stored e.t.c. ?

2. What complexities are there when an application works with images entirely as SVG, serving SVG to browsers instead of actually image files ?

3. Which is technically more memory disturbing to a Webserver: Serving images read from file system (.png, .jpg, .gif) and serving images as SVG (probably from a Database, or from a middle tier) especially under heavy loads, an example scenario of Facebook ?

4. SVG seems to not loose quality when rendered under different "Zooms" or Resolutions, why still, haven't developers worked with SVG alot in image dynamic applications ? i mean, is there any known loss of quality in converting from PNG, JPG or GIF to SVG ?

5. Is my view of using RDBMS like Oracle/MySQL Cluster very naive, for storing highly persistent meta-data as well as the persistent SVG data ?

Please highlight, and give your suggestions about large image storage formats. Thanks

EDIT / UPDATE

There are tools like Image Magick which offer command line option for manipulating images. The most important idea i need probably is this: Can CouchBase Server (whether single server or version 2.0 capable of serving Images at "user-experience acceptable performance" or at a "Social Network Scale" ?)

score 1 · Answer 1 · answered Jul 16 '12 at 13:18

First, I want to mention that your understanding of image file formats may be naive, since you don't provide a lot of details. How do you intend to store (for example) PNG images "as SVG format"?

I can't answer all of your questions, but I'll make the attempt.

"file system or SVG" is a false dichotomy, it's easily possible to store JPG blobs in a database, or SVG files on file-system storage. You can handle any of the bitmap image formats as text too. If you want an example, try opening up a PostScript file with embedded bitmap data. Your question of "why not" implies that the two are interchangeable, and they're typically not. As an example, my company has evaluated a bunch of different file formats for document storage, and we've gone with PDF (shudder) and PS, depending on the situation. We didn't go with SVG for two reasons; firstly while multi-page documents are in the official standard, SVG editors and viewers seem to have choppy support for them. Secondly, SVG presents some complications when being printed in an automated fashion (to demonstrate, try this experiment: whip up an SVG file and an equivalent PostScript file, then try to print both using lp).
I mentioned two already (though if you're dealing with a web-app, neither should bite you since your clients will presumably be using the browsers' rendering engine, and you may not need more than one page). The only other one is browser support, which is, as always, choppy on older editions of IE. You also have to be aware of the font situation; either make sure any fancy typography is treated as a path, or make sure to only use fonts that you know viewers will have access to (for web-apps, CSS3 helps a bit there).
SVGs and other vector/procedural representations tend to be smaller, so I'm inclined to say they'll be easier for a server to handle. This isn't based on any testing, so take it with a grain of salt. Keep in mind that they do tend to consume more resources over at the client end, but that shouldn't be a very big deal in a web situation.
If your image can be expressed as an SVG, yes, very good idea. However, converting arbitrary bitmaps to vector representations is AFAIK an open problem. Some things don't convert well, even manually, and some things are actually larger when expressed as SVGs than as JPGs. For things like business documents, flowcharts or typography, vectors are strictly better (barring the font problem I mention above). Certain types of illustrations do better as vectors, and some do better as rasters. Finally, if you're starting out with a bitmap (say, a photograph), converting it to SVG will either noticeably drop quality, or take a lot of manual time (if it can be done well at all).
This is the one I can't really answer, since I've never built anything to the scale you seem to be aiming at.

tools like image magick : http://www.imagemagick.org/script/convert.php do offer commandline options for format conversion. Like i said, i have never worked with alot of images (which explains the `naive`). You may not need to remind me of my naivity :) But Thanks for the answer — Muzaaya Joshua, Jul 16 '12 at 13:36
@MuzaayaJoshua - That's what I thought you meant. Before committing to SVG as the One True Format, try using `convert` to convert a JPG to an SVG and look at the output. If it's like the PS conversion process, `imagemagick` is going to "convert" raster to vector by using a bit-field. Which doesn't get you any of the benefits of a vector (scalable resolution, small filesize, etc). I've got some experience with processing barcodes this way, and I can tell you that `imagemagick`-generated vectors/procedurals are strictly worse than using bitmaps there, though properly generated ones wouldn't be. — Inaimathi, Jul 16 '12 at 16:19

score 1 · Accepted Answer · answered Jul 16 '12 at 16:25

On databases

What is file but a data and what is file system but a database? Records in database, file on file systems, keys and values in your KV-stores - those are all fruits of the same tree.

Plain file systems were developed over decades to serve purposes of delivering files locally - on top of that you can build a distribution model.

Things like HDFS include distribution as part of file system itself but force an unnecessary overhead when you try to work with files locally.

Things like relational databases or KV-stores might help you laying out your diagrams or storing painlessly more bits of metadata but unless they were specifically designed to work as file storage systems - they gonna fail at it.

Picking storage system is all about tradeoffs and it's up to you to figure out what is best solution to your problem. And chances are that your problems are not even close to facebook's problems. Few servers with cdn on top of them and you gonna be fine.

On file format

SVGs won't work for regular pictures, don't even dream about it.
On a large scale you want to do minimum amount of transformations when you accept files: rescale/compress/crop image if it's not fitting your requirements and store it. Unless you're doing some magic on those images you don't want to convert them into different formats or compress them without real need for it.
On a large scale you want you file to be(ordered by priority):
- served from client's cache
- served from OS cache / memory
- served from file system directly

score 1 · Answer 3 · answered Jul 17 '12 at 04:21

I'd suggest storing your images in S3 -- don't worry about rolling your own until the economics force you to. It's much better to worry about things your users care about, than how your blobs are stored.

As far as Couchbase (I'm a cofounder) we see people using it in similar use cases: typically for metadata and image tracking (who owns it, timestamps, tags, basically anything you want to store or query on.) The Couchbase record would then just contain a URL to the actual image stored on S3.

score 0 · Answer 4 · answered Jul 25 '12 at 00:21

"SVGs won't work for regular pictures, don't even dream about it."

"However, converting arbitrary bitmaps to vector representations is AFAIK an open problem. Some things don't convert well, even manually, and some things are actually larger when expressed as SVGs than as JPGs."

I think both these statements are wrong.

https://sites.google.com/site/jcdsvg/svg_paradoxes.svg

See example three and four. The cat image is saved as a medium resolution png file, which allows the zooming of the image to be high resolution. It is a higher file size then a regular web image, but that is on purpose.

Storing bit-mapped images as SVG is as simple as putting them in a SVG container.

Image Format for Large Storage in relation to Nature of Storage system

4 Answers4