3

This may be a long shot, but I thought I'd ask anyway.

I am looking at using Heroku's new Crane Postgres DB (400 MB RAM Cache) in conjunction with an app I'm deploying on Heroku. The 400 MB cache size should be plenty for our needs... except for one column of one table, in which we store a cached PDF file as a string. The PDF's could easily use up the 400MB RAM pretty quickly if Heroku uses its Cache for them.

If I were on an actual server, I'd just store the PDF as a file, but given Heroku's ephemeral file system, my life is much simpler if I just store the pdf in the DB rather than rigging up a connection to S3 just for this one thing. (It further complicates that we're looking at deploying multiple heroku instances, one for each client ... so using the DB's is simpler than creating a new bucket for each one.) I don't really care about the speed on this. If people are getting the file, they will expect speeds as if it were coming from a file system anyhow, since thats how most file downloads are done. Is there any way to tell PostGRES to not bother caching this column?

Or maybe I'm asking the wrong question, and there is some other way to solve the problem or design alternatives that make it irrelevant.

B Robster
  • 40,605
  • 21
  • 89
  • 122

3 Answers3

4

You don't have to do anything. PostgreSQL will automatically use TOAST on values larger than 8 kB.

From http://www.postgresql.org/docs/9.1/static/storage-toast.html

PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or "the best thing since sliced bread").

PostgreSQL caching is also done at the page level so TOAST does not have to be cached with the rest of the row (http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf).

  • I ended up using Base64 Encoding my files and using a Text data type field, feeling nice and warm that it would be TOASTed. The alternitive (using a LOB datatyp) required too much custom postgresql syntax for my tastes. This link was very helpful: http://wiki.postgresql.org/wiki/BinaryFilesInDB#Storing_Binary_files_in_the_Database – B Robster Jul 11 '12 at 06:00
3

The fact that Postgres can TOAST large field values, it doesn't mean it's the best thing to do.

If you store big fields in your main database, it will make many things harder, such as creating forks or followers, and creating and restoring backups in particular. I would strongly reconsider utilizing S3 to store the PDF files, and simply invest in automated onboarding of new clients (create heroku app, provision database, provision/create S3 bucket).

hgmnz
  • 13,208
  • 4
  • 37
  • 41
  • 1
    As we move forward, the DB-files, as predicted, are proving themselves to be somewhat of a bottleneck. It got us off the ground a bit faster, so it wasn't necessarily bad to start with, but I'm hoping to move to an S3-based system soon. – B Robster Sep 13 '12 at 07:38
0

I'm not quite sure how you're managing to store large PDF's, since Postgres imposes a maximum field size (or at least a maximum page size). However, you might be able to get around this by using TOAST. TOASTed items are stored in a separate (physical) table, so if you're not selecting them frequently they shouldn't be cached.
If you are selecting them frequently, then I'm not sure if what you want is possible. Remember that Postgres only supplies one "level" of caching - the Linux VFS does caching also.

alex
  • 207
  • 1
  • 4
  • 17
  • I've read that bytea has a (theoretical) max size of 1GB or more. My PDF's will consistently be a few MB each. How does the max size of bytea relate to maximum page size? (looking at TOAST right now, thanks!) – B Robster Jul 10 '12 at 22:40
  • The following were very informative: http://stackoverflow.com/questions/54500/storing-images-in-postgresql http://wiki.postgresql.org/wiki/BinaryFilesInDB#Storing_Binary_files_in_the_Database – B Robster Jul 10 '12 at 22:50