147

Alright, so I'm working on an application which will use a Linux back-end running PostgreSQL to serve up images to a Windows box with the front end written in C#.NET, though the front-end should hardly matter. My question is:

  • What is the best way to deal with storing images in Postgres?

The images are around 4-6 megapixels each, and we're storing upwards of 3000. It might also be good to note: this is not a web application, there will at most be about two front-ends accessing the database at once.

akdom
  • 32,264
  • 27
  • 73
  • 79

7 Answers7

85

Updating to 2012, when we see that image sizes, and number of images, are growing and growing, in all applications...

We need some distinction between "original image" and "processed image", like thumbnail.

As Jcoby's answer says, there are two options, then, I recommend:

  • use blob (Binary Large OBject): for original image store, at your table. See Ivan's answer (no problem with backing up blobs!), PostgreSQL additional supplied modules, How-tos etc.

  • use a separate database with DBlink: for original image store, at another (unified/specialized) database. In this case, I prefer bytea, but blob is near the same. Separating database is the best way for a "unified image webservice".

  • use bytea (BYTE Array): for caching thumbnail images. Cache the little images to send it fast to the web-browser (to avoiding rendering problems) and reduce server processing. Cache also essential metadata, like width and height. Database caching is the easiest way, but check your needs and server configs (ex. Apache modules): store thumbnails at file system may be better, compare performances. Remember that it is a (unified) web-service, then can be stored at a separate database (with no backups), serving many tables. See also PostgreSQL binary data types manual, tests with bytea column, etc.

NOTE1: today the use of "dual solutions" (database+filesystem) is deprecated (!). There are many advantages to using "only database" instead dual. PostgreSQL have comparable performance and good tools for export/import/input/output.

NOTE2: remember that PostgreSQL have only bytea, not have a default Oracle's BLOB: "The SQL standard defines (...) BLOB. The input format is different from bytea, but the provided functions and operators are mostly the same",Manual.


EDIT 2014: I have not changed the original text above today (my answer was Apr 22 '12, now with 14 votes), I am opening the answer for your changes (see "Wiki mode", you can edit!), for proofreading and for updates.
The question is stable (@Ivans's '08 answer with 19 votes), please, help to improve this text.

titanofold
  • 2,852
  • 1
  • 15
  • 21
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • 5
    What is the reference for "...the use of "dual solutions" (database+filesystem) is deprecated..."? – dangel Feb 10 '19 at 02:13
  • 1
    Some 2019 news! [Since 2018](https://github.com/PostgREST/postgrest/pull/802) PostgREST supports direct output of *bytea* to the web. See this [NGINX simple config](https://github.com/PostgREST/postgrest/pull/802#issuecomment-280864615) to use it. See [PostgREST Guide on binary output](http://postgrest.org/en/v6.0/api.html#binary-output) – Peter Krauss Sep 01 '19 at 04:37
58

Re jcoby's answer:

bytea being a "normal" column also means the value being read completely into memory when you fetch it. Blobs, in contrast, you can stream into stdout. That helps in reducing the server memory footprint. Especially, when you store 4-6 MPix images.

No problem with backing up blobs. pg_dump provides "-b" option to include the large objects into the backup.

So, I prefer using pg_lo_*, you may guess.

Re Kris Erickson's answer:

I'd say the opposite :). When images are not the only data you store, don't store them on the file system unless you absolutely have to. It's such a benefit to be always sure about your data consistency, and to have the data "in one piece" (the DB). BTW, PostgreSQL is great in preserving consistency.

However, true, reality is often too performance-demanding ;-), and it pushes you to serve the binary files from the file system. But even then I tend to use the DB as the "master" storage for binaries, with all the other relations consistently linked, while providing some file system-based caching mechanism for performance optimization.

Ivan Krechetov
  • 18,802
  • 8
  • 49
  • 60
  • 21
    After 10 years, do you think that your points are still valid? Any updates since then? – leventunver Mar 28 '18 at 16:02
  • 4
    @leventunver No, the points to do not hold. For instance the first about `BYTEA` being a "normal" column. Postgres has supported _streaming_ to/from `BYTEA` columns for many years, which means you do not have to store the contents in memory before storing it in the db. – oligofren Mar 11 '19 at 11:07
35

In the database, there are two options:

  • bytea. Stores the data in a column, exported as part of a backup. Uses standard database functions to save and retrieve. Recommended for your needs.
  • blobs. Stores the data externally, not normally exported as part of a backup. Requires special database functions to save and retrieve.

I've used bytea columns with great success in the past storing 10+gb of images with thousands of rows. PG's TOAST functionality pretty much negates any advantage that blobs have. You'll need to include metadata columns in either case for filename, content-type, dimensions, etc.

jcoby
  • 4,210
  • 2
  • 29
  • 25
26

Quick update to mid 2015:

You can use the Postgres Foreign Data interface, to store the files in more suitable database. For example put the files in a GridFS which is part of MongoDB. Then use https://github.com/EnterpriseDB/mongo_fdw to access it in Postgres.

That has the advantages, that you can access/read/write/backup it in Postrgres and MongoDB, depending on what gives you more flexiblity.

There are also foreign data wrappers for file systems: https://wiki.postgresql.org/wiki/Foreign_data_wrappers#File_Wrappers

As an example you can use this one: https://multicorn.readthedocs.org/en/latest/foreign-data-wrappers/fsfdw.html (see here for brief usage example)

That gives you the advantage of the consistency (all linked files are definitely there) and all the other ACIDs, while there are still on the actual file system, which means you can use any file system you want and the webserver can serve them directly (OS caching applies too).

Kenyakorn Ketsombut
  • 2,072
  • 2
  • 26
  • 43
  • 1
    Thanks.. Do foreign data wrappers (file_fdw) provide write access for images? I want to store images to a FileSystem and its metadata in Postgresql, but I have to maintain the consistency too. Do you have a detailed solution? Is there any other extension available? Multicorn needs python and I would prefer having to do without using Python.. – Jay Khatwani Mar 18 '16 at 03:35
  • 2
    Yes they have write access. They are fully consistent from/in both directions. And no I don't know of an equal solution that does this without python. – Kenyakorn Ketsombut Mar 18 '16 at 03:40
25

Update from 10 years later In 2008 the hard drives you would run a database on would have much different characteristics and much higher cost than the disks you would store files on. These days there are much better solutions for storing files that didn't exist 10 years ago and I would revoke this advice and advise readers to look at some of the other answers in this thread.

Original

Don't store in images in the database unless you absolutely have to. I understand that this is not a web application, but if there isn't a shared file location that you can point to save the location of the file in the database.

//linuxserver/images/imagexxx.jpg

then perhaps you can quickly set up a webserver and store the web urls in the database (as well as the local path). While databases can handle LOB's and 3000 images (4-6 Megapixels, assuming 500K an image) 1.5 Gigs isn't a lot of space file systems are much better designed for storing large files than a database is.

Kris Erickson
  • 33,454
  • 26
  • 120
  • 175
  • 19
    But you have to come up with a way to distribute the files over several directories. Filesystems aren't that good at storing million of files in a *single* directory (actually ten thousands is already a problem) –  Nov 03 '13 at 10:38
  • 1
    Doesn't answer the original question. I'm personally looking to store images in Postgres just because I want SQL as my layer of abstraction and also don't want to manage the files in my ext4 filesystem. – sudo Sep 06 '17 at 20:08
  • I'm conflicted, this doesn't answer the question, but I upvoted it, because it is a better answer than an answer to the question. – Andrew Carr Jun 12 '19 at 12:13
14

If your images are small, consider storing them as base64 in a plain text field.

The reason is that while base64 has an overhead of 33%, with compression that mostly goes away. (See What is the space overhead of Base64 encoding?) Your database will be bigger, but the packets your webserver sends to the client won't be. In html, you can inline base64 in an <img src=""> tag, which can possibly simplify your app because you won't have to serve up the images as binary in a separate browser fetch. Handling images as text also simplifies things when you have to send/receive json, which doesn't handle binary very well.

Yes, I understand you could store the binary in the database and convert it to/from text on the way in and out of the database, but sometimes ORMs make that a hassle. It can be simpler just to treat it as straight text just like all your other fields.

This is definitely the right way to handle thumbnails.

(OP's images are not small, so this is not really an answer to his question.)

ccleve
  • 15,239
  • 27
  • 91
  • 157
9

2022 Answer

The most common pattern now is to only store a reference to the image in your database, and store the image itself in a filesystem (i.e. S3 bucket).

The benefit is that your database backups are smaller, there's no longer a single point of failure, load can now be distributed away from the database, and cloud storage buckets are often cheaper than database storage.

The negative is that you have to manage images in two locations - delete one image and your app needs to keep track and delete it from the other.

Janac Meena
  • 3,203
  • 35
  • 32
  • May I ask why there is no longer a single point of failure ? Database is still necessary with your solution and you also need the S3 storage, so isn't there one more point of failure instead ? – Waelmio Jan 29 '23 at 00:01
  • @Waelmio yes, there is more than one point of failure, meaning there is no single point of failure. If your database is destroyed, at least you still have the image files on S3. If your S3 is destroyed, then at least you have metadata and image IDs which can assist in data recovery. – Janac Meena Feb 02 '23 at 21:50
  • But your service can't work if either S3 or your DB don't works, so no, we do not have less single point of failure, I indeed meant you had one more single point of failure. Data recovery happens after the failure, it doesn't means there was no failure – Waelmio Feb 02 '23 at 23:33
  • I see what you mean, however this is a little out of scope for the original question. I agree with you that to truly distribute the risk of a single-point of failure, there should be other steps taken, like a backup, data-recovery instance, replication, etc. – Janac Meena Feb 03 '23 at 17:00