1

There is a rule I once heard that when assigning storage size to char and varchar that instead of doing the regular 4, 8, 16, 32 rule you want to actually use 3,7,15, 31. Apparently it has something to do with optimizing the space in which it is stored.
Does anyone know if there is validity to this statement or is there a better way of assigning size to char and varchar in postgreSQL? Also is this rule for just postgreSQL or something to keep in mind in all of the SQL languages?

Eric Thomas
  • 667
  • 1
  • 8
  • 18
  • Are you referring data type alignment? If so, see Erwin Brandstetter's answer from a while back (it's a good one). – bma Jul 25 '13 at 15:25
  • I guess [this is the answer](http://stackoverflow.com/a/7431468/131874) @bma referred to. But it is not clear what you are up to. – Clodoaldo Neto Jul 25 '13 at 15:42
  • Oops! Yeah, I forgot to paste the link. Thanks! – bma Jul 25 '13 at 15:43
  • 2
    If you are talking about the limits on text types like `char(50)` or `varchar(50)` (50 symbols in this example) - in Postgres they do not affect the space you need to store this data. – Ihor Romanchenko Jul 25 '13 at 15:51
  • I guess if I am sounding a bit vague its because of my lack of deeper understand on to how the data is physically stored. – Eric Thomas Jul 25 '13 at 16:08

1 Answers1

1

You're mis-remembering something that applies at a much lower level.

Strings in the "C" language are terminated by a zero-byte. So: "hello" would traditionally take six bytes. Of course, that was back when everyone assumed a single character would fit neatly into a single byte. Not the case any more.

The other (main) way to store strings is to have a length stored at the front, and then have the characters following. As it happens that is what PostgreSQL does, and I believe it even has an optimisation so the length doesn't take up so much space with short strings.

There are also separate issues where memory access is cheaper/easier at 2/4/8 byte boundaries (depending on the age of the machine) and memory allocation can be more efficient in powers of 2 (1024, 2048, 4096 bytes).

For PostgreSQL (or any of the major scripting languages / Java) just worry about representing your data accurately. About 99% of the time fiddly low-level optimisation is irrelevant. Actually, even if you are writing in "C", don't worry about it there until you need to.

Richard Huxton
  • 21,516
  • 3
  • 39
  • 51