0

Via the command

 select 
   relname as "Table",
   pg_size_pretty(pg_total_relation_size(relid)) as "Size",
   pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as "External Size"
   from pg_catalog.pg_statio_user_tables order by pg_total_relation_size(relid) desc;

I can retrieve the size of my tables. (according to this article) which works. But I just came to a weird conclusion. Inserting multiple values which contain approx 30,000 characters each, doesn't change the size.

When executing before inserting I get

tablename   | size text |external size text
-------------------------------------------
participant |   264kb   |  256kb

After inserting (btw they are base64 encoded images) and executing the select command, I get the exact same sizes returned.

I figured this couldn't be correct so I was wondering, is the command wrong? Or does PostgreSQL do something special with very large strings?

(In pgadminIII the strings do not show in the 'view data' view but do are shown when executing select base64image from participant).

And next to this was I wondering (not my main question but would be nice to have answered) if this is the best practice (since my app generates base64 images) or should I f.e. convert them to an image on the backend and store the images remotely on my server instead of in the database?

Ivar Reukers
  • 7,560
  • 9
  • 56
  • 99
  • I did try to update all participant records, resetting their `base64image` to be null, but not a specific `Delete from` query – Ivar Reukers Nov 08 '16 at 09:28
  • But what does that mean if I'll insert 20 records of 30,000 characters? What will postgres do with the data if it exceeds the allocated storage? – Ivar Reukers Nov 08 '16 at 09:42
  • last question, then I'll close this question or you can submit this convo as an answer, Will the allocated storage make my retrieval of data slower in some sort? Or won't it have an effect on the speed of the database (if disk storage is large enough)? – Ivar Reukers Nov 08 '16 at 09:45
  • Have you checked to see if the data is being written to a TOAST table? – David Aldridge Nov 08 '16 at 10:06

1 Answers1

2

Storage management

When you insert (or update) data that requires more space on disk then it currently uses, Postgres (or actually any DBMS) will allocate that space to store the new data.

When you delete data either by setting a column to a smaller or by deleting rows, the space is not immediately released to the operating system. The assumption is that that space will most probably be re-used by subsequent updates or inserts and extending a file is a relatively expensive operation so the database tries to avoid that (again this is something that all DBMS do).

If the space allocated is much bigger then the space that is actually stored, this can influence the speed of the retrieval - especially for table scans ("Seq Scan" in the execution plan) as more blocks then necessary need to be read from the harddisk. This is also known as "table bloat".

It is possible to shrink the space used using the statement VACUUM FULL. But that should only be used if you do suspect a problem with "bloat". This blog post explains this in more details.

Storing binary data in the database

If you want to store images in the database, then by all means use bytea instead of a string value. An image encoded in Base64 takes twice as much spaces as the raw data would.

There are pros and cons regarding the question if binary data (images, documents) should be stored in the database or not. This is a somewhat subjective decision to make and depends on a lot of external factors.

See e.g. here: Which is the best method to store files on the server (in database or storing the location alone)?

Community
  • 1
  • 1
  • What is the difference between `bytea` and `bytea[]`? And should I first decode the base64 string or can I immediately convert that to a byte array? – Ivar Reukers Nov 08 '16 at 10:28