1

Given a postgreSQL query with the following tables:

+--------------------+
|Foo                 |
+--------------------+
| id | value1 | refe |
+--------------------+

+--------------------+
|Bar                 |
+--------------------+
| id | value2 | refe |
+--------------------+

+-------------+
|Refe         |
+-------------+
| id | value3 |
+-------------+

Where the refe columns are foreign keys to the id of Refe. Now Refe stores additional data that is relevant for both Foo and Bar (and potentially other tables).

I've created a view that stores the "active" refe's in other words: the union of the refe fields of Foo and Bar:

CREATE OR REPLACE VIEW liverefe AS 
        (SELECT refe FROM Foo)
        UNION (SELECT refe FROM Bar)

Now when one deletes a row from Foo or Bar, there is a chance that the refe becomes dead (no other Foo or Bar row refers to it). In that case it should be removed.

Can one explain how to implement such "reference counting" trigger?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • For pedantry's sake: This is reference counting, not GC. Garbage collection would be akin to what `VACUUM` does. – cHao Aug 07 '14 at 14:15
  • Well both terms are somehow ambiguous and overlapping in my opinion. If one sees rows in a database as objects and foreign keys are references. It's some kind of garbage collection. How I've learned it is that "reference counting" is just an *strategy* to perform garbage collection, other techniques like *Boehm* are variants... – Willem Van Onsem Aug 07 '14 at 14:25
  • The big difference is that with GC, collection is done separately, usually in batches. If you wanted to have a daemon that ran every X minutes and deleted all the rows with no refs, for example, that'd be GC. – cHao Aug 07 '14 at 14:28
  • Well that's ok as well. The point is however that I don't expect such operation to happen much. And at removal, it's clear that only the id to be removed can be collected. This gives an advantage since one knows where to look (otherwise one must first compute all live `refe`'s). – Willem Van Onsem Aug 07 '14 at 14:30
  • So it's OK for orphan rows to sit around a while even with nothing referring to them, as long as they *eventually* get deleted? – cHao Aug 07 '14 at 14:49
  • i don't know about a ref count trigger, but, have you considered a delete trigger on Foo and Bar? That trigger func could check for any remaining references. Either that, or you can maintain a 'ref_count' column in Refe, and trigger on that hitting 0 (obviously still need the Foo and Bar insert/delete triggers to manage ref_count). – Greg Aug 07 '14 at 14:50
  • @cHao: indeed... It's not of vital importance, but eventually (within reasonable time) they should disappear. – Willem Van Onsem Aug 07 '14 at 14:51
  • @Greg: well the count should not explicitly calculated. From the moment one reference exists, the row should not be deleted. Thus one can do a check at removal. – Willem Van Onsem Aug 07 '14 at 14:55

2 Answers2

2

Use NOT EXISTS anti-semi-joins.

To remove all dead rows in refe:

DELETE FROM refe r
WHERE NOT EXISTS (SELECT 1 FROM foo WHERE refe = r.id)
AND   NOT EXISTS (SELECT 1 FROM bar WHERE refe = r.id);

(Possibly) delete a specific row from refe after DELETE / UPDATE in one of the referencing tables:

DELETE FROM refe r
WHERE r.id = 12345   -- your id here
AND   NOT EXISTS (SELECT 1 FROM foo WHERE refe = r.id)
AND   NOT EXISTS (SELECT 1 FROM bar WHERE refe = r.id);

Related answer with a complete code example for such a trigger:

If you should be in the habbit of updating the refe columns, add a trigger like this:

CREATE TRIGGER foo_updelaft_kill_orphaned_refe
AFTER DELETE OR UPDATE OF refe
ON foo
FOR EACH ROW EXECUTE PROCEDURE f_trg_kill_orphaned_refe();

Same for table bar etc.
This way, the trigger is only invoked when an UPDATE actually targets the refe column.

Details in the manual.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
2

If you want the delete to happen immediately, then you can declare a trigger function like so...

CREATE FUNCTION check_ref_count()
RETURNS trigger
LANGUAGE plpgsql
AS $$
  BEGIN
    DELETE FROM Refe WHERE id NOT IN (SELECT refe from liverefe);
    RETURN NULL;
  END;
$$;

And then attach it to the tables in question.

CREATE TRIGGER slay_orphans
AFTER DELETE ON Foo
FOR EACH STATEMENT
EXECUTE PROCEDURE check_ref_count();

CREATE TRIGGER slay_orphans
AFTER DELETE ON Bar
FOR EACH STATEMENT
EXECUTE PROCEDURE check_ref_count();

(Depending on how you're working with these items, you might have to say AFTER DELETE OR UPDATE rather than just AFTER DELETE. Particularly if you change refe a lot. If you typically set it once and leave it alone, AFTER DELETE should be fine.)

Note: once you do this, a record in Refe won't survive the next deletion unless some record in liverefe has its ID. Even though that's what you want, it means that if deletions are common, even brand new records might get deleted right out from under you.

You can make things stricter by only checking Refes that deleted rows were pointing at. Just add id = old.refe to the delete conditions, and make the triggers FOR EACH ROW rather than FOR EACH STATEMENT. This'll make it so that only previously-pointed-at records are candidates for deletion. (But it also means that cleanup is not retroactive -- current orphans won't get cleaned up unless something points at them and then points elsewhere or gets deleted. And if you go this route, AFTER DELETE OR UPDATE is essential -- the sloppiness of letting the delete trigger handle everything, causes rows to basically leak if you're not cleaning up the whole table.)

Or, add a timestamp to Refe, and change the query to ignore records whose timestamp is less than X seconds/minutes/whatever ago.

However you resolve that issue, you'll probably also want to consider using transactions
to help ensure that a record's not getting deleted right as you're trying to point at it. And make sure your foreign keys are specified as ON DELETE RESTRICT (so that deleting a record in Refe won't be allowed unless the record is indeed an orphan).


If you could live with rows getting deleted every so often rather than immediately, it gets easier still, since you don't have to create a function and such.

Just create a script that issues this query:

DELETE FROM Refe WHERE id NOT IN (SELECT refe FROM liverefe)

and set that up as a cron job (or scheduled task, in Windows).

cHao
  • 84,970
  • 20
  • 145
  • 172
  • 1
    Ah, you have been working on this while I posted. I know the feeling. I was faster because I posted a complete solution before. Still, very good and complete answer. – Erwin Brandstetter Aug 07 '14 at 23:20
  • Nice solution, was hard to select one as the accepted solution :(. When tried, pgsql errored on `RETURN 0`, should probably be replaced with `RETURN NULL`? – Willem Van Onsem Aug 08 '14 at 11:28
  • @CommuSoft: Possibly. The return value is supposed to be ignored for trigger functions, though... – cHao Aug 08 '14 at 13:05
  • @cHao: yes but according to the PostgreSQL specifications a trigger needs a `RETURN` statement that either returns `NULL` or a row that matches the table on which the trigger is applied. – Willem Van Onsem Aug 08 '14 at 13:09
  • @CommuSoft: Ahh, you're right. Just looked over the docs again. :/ Edited. – cHao Aug 08 '14 at 13:27