0

Consider the following postgres (version 9.4) database:

testbase=# select * from employee;
 id |               name               
----+----------------------------------
  1 | johnson, jack                   
  2 | jackson, john                   
(2 rows)

testbase=# select * from worklog;
 id |             activity             | employee |            time            
----+----------------------------------+----------+----------------------------
  1 | department alpha                 |        1 | 2018-01-27 20:32:16.512677
  2 | department beta                  |        1 | 2018-01-27 20:32:18.112356
  5 | break                            |        1 | 2018-01-27 20:32:22.255563
  3 | department gamma                 |        2 | 2018-01-27 20:32:20.073173
  4 | department gamma                 |        2 | 2018-01-27 20:32:21.05962
(5 rows)

The column 'name' in table 'employee' is of type character(32) and unique, the column 'employee' in 'worklog' references 'id' from the table 'employee'. The column 'id' is the primary key in either table.

I can see all activities from a certain employee by issuing:

testbase=# select * from worklog where employee=(select id from employee where name='johnson, jack');
 id |             activity             | employee |            time            
----+----------------------------------+----------+----------------------------
  1 | department alpha                 |        1 | 2018-01-27 20:32:16.512677
  2 | department beta                  |        1 | 2018-01-27 20:32:18.112356
  5 | break                            |        1 | 2018-01-27 20:32:22.255563
(3 rows)

I would rather like to simplify the query to

testbase=# select * from worklog where employee='johnson, jack';

For this I would change 'employee' to type character(32) in 'worklog' and declare 'name' as primary key in table 'employee'. Column 'employee' in 'worklog' would, of course, reference 'name' from table 'employee'.

My question:

Will every new row in 'worklog' require additional 32 bytes for name of the 'employee' or will postgres internally just keep a pointer to the foreign field without duplicating the name for every new row?

I suppose that the answer for my question is somewhere in the documentation but I could not find it. It would be very helpful if someone could provide an according link.

PS: I did find this thread, however, there was no link to some official documentation. The behaviour might also have changed, since the thread is now over seven years old.

  • Name may change over time, go tell a now divorced lady, domestic abuse victim, that she'll have to keep her ex-husband name because your system do not allow name change... – Blag Jan 27 '18 at 20:55
  • 2
    Unrelated, but: `character` is almost always a bad choice. If you want to store characters use `varchar` or `text` –  Jan 27 '18 at 21:16
  • @Blag Not an issue here because this is just meant as an example. –  Jan 27 '18 at 22:11
  • @a_horse_with_no_name Why is it a bad choice? I thought that the fixed length of the character type allows for a more efficient processing. –  Jan 27 '18 at 22:12
  • char is a lot a space lose... you only use it for text that have a fixed length (like product ref usually); But I'm with Gordon, just make a view with the join behind ;) – Blag Jan 27 '18 at 22:36

1 Answers1

1

Postgres will store the data that you tell it to store. There are some new databases that will do compression under the hood -- and Postgres might have features to enable that (I do not know all Postgres features).

But, you shouldn't do this. Integer primary keys are more efficient than strings for three reasons:

  • They are fixed length in bytes.
  • They are shorter.
  • Collations are not an issue.

Stick with your original query, but write it using a join:

select wl.*
from worklog wl join
     employee e
     on wl.employee = e.id
where e.name = 'johnson, jack';

I suggest this because this is more consistent with how SQL works and makes it easier to choose multiple employees.

If you want to see the name and not the id, create a view (say v_worklog) and add in the employee name.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786