2

As an example, consider this hierarchical schema. enter image description here

Assume all id fields are auto incrementing primary keys and that foreign keys are named by [parent_table_name]_id convention.

The problem

As soon as there are multiple companies in the database, then companies will share all primary key sequences between them.

For example, if there are two company rows, the customer_group table could look like this

| id | company_id |
-------------------
| 1  |     1      |
| 2  |     1      |
| 3  |     2      |
| 4  |     2      |
| 5  |     1      |
-------------------

But it should look like this

| id | company_id |
-------------------
| 1  |     1      |
| 2  |     1      |
| 1  |     2      |
| 2  |     2      |
| 3  |     1      |
-------------------

This behavior should also be exhibited for customer and any other table in the tree that directly or indirectly references company.

Note that I will most likely make a second id column (named something like relative_id) for this purpose, keeping the unique id column intact, as this is really mostly for display purposes and how users will reference these data entities.


Now if this was just one level of hierarchy, it would be a relatively simple solution. I could make a table (table_name, company_id, current_id) and a trigger procedure that fires before insert on any of the tables, incrementing the current id by 1 and setting the row's relative_id to that value. It's trivial when the company_id is right there in the insert query.

But how about the tables that don't reference company directly? Like the lowest level of the hierarchy in this example, workorder, which only references customer.
Is there a clean, reusable solution to climb the ladder all the way from 'customer_id' to ultimately retrieve the parenting company_id?

Going recursively up the hierarchy with SELECTs on each INSERT doesn't sound too appealing to me, performance wise.

I also do not like the idea of just adding a foreign key to company for each of these tables, the schema would get increasingly uglier with each additional table.

But these are the two solutions I can see, but I may not be looking in the right places.

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
Steinthor.palsson
  • 6,286
  • 13
  • 44
  • 51

2 Answers2

1

The company shouldn't care what the primary key is if you're using generated keys. They're supposed to be meaningless; compared for equality and nothing else. I grumbled about this earlier, so I'm really glad to see you write:

Note that I will most likely make a second id column (named something like relative_id) for this purpose, keeping the unique id column intact, as this is really mostly for display purposes and how users will reference these data entities.

You're doing it right.

Most of the time it doesn't matter what the ID is, so you can just give them whatever comes out of a sequence and not care about holes/gaps. If you're concerned about inter-company leakage (unlikely) you can obfuscate the IDs by using the sequence as an input to a pseudo-random generator. See the function Daniel Verité wrote in response to my question about this a few years ago, pseudo_encrypt.

There are often specific purposes for which you need perfectly sequential gapless IDs, like invoice numbers. For those you need to use a counter table and - yes - look up the company ID. Such ID generation is slow and has terrible concurrency anyway, so an additional SELECT with a JOIN or two on indexed keys won't hurt much. Don't go recursively up the schema with SELECTs though, just use a series of JOINs. For example, for an insert into workorder your key generation trigger on workorder would be something like the (untested):

   CREATE OR REPLACE FUNCTION workorder_id_tgfn() RETURNS trigger AS $$
   BEGIN
       IF tg_op = 'INSERT' THEN
           -- Get a new ID, locking the row so no other transaction can add a
           -- workorder until this one commits or rolls back.
           UPDATE workorder_ids
           SET next_workorder_id = next_workorder_id + 1 
           WHERE company_id = (SELECT company_id
                 FROM customer
                 INNER JOIN customer_group ON (customer.customer_group_id = customer_group.id) 
                 INNER JOIN company ON (customer_group.company_id = company.id)
               WHERE customer.id = NEW.customer_id)
           RETURNING next_workorder_id
           INTO NEW.id;
       END IF;
   END;
   $$ LANGUAGE 'plpgsql';

For the UPDATE ... RETURNING ... INTO syntax see Executing a Query with a Single-Row Result.

There can be gaps in normal sequences even if there's no multi-company problem. Observe:

CREATE TABLE demo (id serial primary key, blah text);

BEGIN;
INSERT INTO demo(blah) values ('aa');
COMMIT;

BEGIN;
INSERT INTO demo(blah) values ('bb');
ROLLBACK;

BEGIN;
INSERT INTO demo(blah) values ('aa');
COMMIT;

SELECT * FROM demo;

Result:

regress=#     SELECT * FROM demo;
 id | blah 
----+------
  1 | aa
  3 | aa
Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Thank you. I think this will be the solution I'll go with. I don't suppose there is a (non-hacky) way to join down the tree in this fashion with arbitrary depth, so I could re-use the same trigger function for more tables? Though you may be right, I probably don't need perfectly sequential ids for everything (I'm just a sucker for consistency). Wondering though, you refer to a 'workorder_ids' table, do you recommend using a separate tables tables for entity (e.g. 'customer_ids', 'product_ids', etc) rather than just one big counter table which holds the table name as well? – Steinthor.palsson Oct 08 '12 at 14:33
  • @steini With some complex querying of `pg_catalog` and/or `information_schema` and the use of `EXECUTE` for dynamic SQL you might be able to do it, but it'd be horrible. Seriously, use gapless sequences only where absolutely required for business purposes, the performance is horrible because no two transactions using the same gapless sequence may run concurrently. As for many counter tables vs one big one - meh, whatever. No real difference. – Craig Ringer Oct 08 '12 at 19:56
0

"But it should look like this"

| id | company_id |
-------------------
| 1  |     1      |
| 2  |     1      |
| 1  |     2      |
| 2  |     2      |
| 3  |     1      |
-------------------

I think it should not and I think you want a many to many relationship. The customer_group table:

| id | name |
-------------
| 1  |  n1  |
| 2  |  n2  |
| 3  |  n3  |
-------------

And then the customer_group_company table:

| group_id | company_id |
-------------------------
|    1     |     1      |
|    2     |     1      |
|    1     |     2      |
|    2     |     2      |
|    3     |     1      |
-------------------------
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • I don't think I want x_company join tables for every table. Would be simpler to just add a 'company_id' foreign key to each one and use my pseudo SEQUENCE solution (briefly described in the question). Also according to this, I would be re-using customer groups across companies if I'm not misunderstanding. – Steinthor.palsson Oct 08 '12 at 02:40