9

Context

We're building a blog for an intro. to databases course project.

In our blog, we want to be able to set Labels on Posts. The Labels can't exist by themselves, they only do so if they are related to a Posts. This way, Labels that are not used by any Posts shouldn't stay in the database.

More than one Label can belong to a single Post, and more than a single Post can use a Label.

We are using both SQLite3 (locally/testing) and PostgreSQL (deployment).

Implementation

Here is the SQL (SQLite3 flavor) that we use to create those two tables, along with the relationship table:

Posts

CREATE TABLE IF NOT EXISTS Posts(
   id INTEGER PRIMARY KEY AUTOINCREMENT,
   authorId INTEGER,
   title VARCHAR(255),
   content TEXT,
   imageURL VARCHAR(255),
   date DATETIME,
   FOREIGN KEY (authorId) REFERENCES Authors(id) ON DELETE SET NULL
)

Labels

CREATE TABLE IF NOT EXISTS Labels(
   id INTEGER PRIMARY KEY AUTOINCREMENT,
   name VARCHAR(255) UNIQUE,
   -- This is not working:
   FOREIGN KEY (id) REFERENCES LabelPosts(labelId) ON DELETE CASCADE 
)

LabelPosts (relation between Post [1..*] -- * Label)

CREATE TABLE IF NOT EXISTS LabelPosts(
    postId INTEGER,
    labelId INTEGER,
    PRIMARY KEY (postId, labelId),
    FOREIGN KEY (postId) REFERENCES Posts(id) ON DELETE CASCADE
)

Problem

  • Using SQLite3, Labels are not deleted from the database when I remove all references to it from the LabelPosts table. I think for the reason given by Postgres, despite SQLite accepting the table without warning.

  • PostgreSQL complains that labelId is not unique within LabelPosts, which is true and also required, since it's many-to-many:

pq: S:"ERROR" R:"transformFkeyCheckAttrs" L:"6511" C:"42830" F:"tablecmds.c"
M:"there is no unique constraint matching given keys for referenced table \"labelposts\""

So I understand that I'm doing my constraint wrong. However I don't know how to do it properly.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
AntoineG
  • 1,237
  • 4
  • 15
  • 25
  • Are you allowed to use triggers? – Priidu Neemre Mar 23 '13 at 01:02
  • Yes I am, but I'm not too familiar with them. – AntoineG Mar 23 '13 at 01:05
  • 3
    This does not look like a very standard approach of doing things. I can see what you're trying to achieve, however the foreign key constraint on table `Labels` does not fit. You should do `CONSTRAINT fk_LabelPosts_labelId FOREIGN KEY(labelId) REFERENCES Labels(labelId)` on `LabelPosts` instead (in my mind). Some other minor observations: 1. It is often advised to name your entities in singular form, ie. `Label`, `Post` and `LabelPost` in this case. 2. I would recommend naming all constraints - this will make it easier to DROP unneccessary constraints later. – Priidu Neemre Mar 23 '13 at 01:11
  • If I switch the constraint on the relation, it implies that a `Post` could only form a relation with a `Label` if this `Label` already exists, which is not what I want. Also, it wouldn't solve my problem where I want `Labels` to disappear automatically once they are not referenced by any `Post`. – AntoineG Mar 23 '13 at 01:16

2 Answers2

26

We are using both SQLite3 (locally/testing) and PostgreSQL (deployment).

This is begging for trouble. You will keep running into minor incompatibilities. Or not even notice them until much later, when damage is done. Don't do it. Use PostgreSQL locally, too. It's freely available for most every OS. For someone involved in a "databases course project" this is a surprising folly. Related:

Other advice:

  • As @Priidu mentioned in the comments, your foreign key constraints are backwards. This is not up for debate, they are simply wrong.

  • In PostgreSQL use a serial or IDENTITY column (Postgres 10+) instead of SQLite AUTOINCREMENT. See:

  • Use timestamp (or timestamptz) instead of datetime.

  • Don't use mixed case identifiers.

  • Don't use non-descriptive column names like id. Ever. That's an anti-pattern introduced by half-wit middleware and ORMs. When you join a couple of tables you end up with multiple columns of the name id. That's actively hurtful.

  • There are many naming styles, but most agree it's better to have singular terms as table names. It's shorter and at least as intuitive / logical. label, not labels.

Everything put together, it could look like this:

CREATE TABLE IF NOT EXISTS post (
   post_id   serial PRIMARY KEY
 , author_id integer
 , title     text
 , content   text
 , image_url text
 , date      timestamp
);

CREATE TABLE IF NOT EXISTS label (
   label_id  serial PRIMARY KEY
 , name      text UNIQUE
);

CREATE TABLE IF NOT EXISTS label_post(
    post_id  integer REFERENCES post(post_id) ON UPDATE CASCADE ON DELETE CASCADE
  , label_id integer REFERENCES label(label_id) ON UPDATE CASCADE ON DELETE CASCADE
  , PRIMARY KEY (post_id, label_id)
);

Trigger

To delete unused labels, implement a trigger. I supply another version since I am not happy with the one provided by @Priidu:

CREATE OR REPLACE FUNCTION f_trg_kill_orphaned_label() 
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   DELETE FROM label l
   WHERE  l.label_id = OLD.label_id
   AND    NOT EXISTS (
      SELECT 1 FROM label_post lp
      WHERE  lp.label_id = OLD.label_id
      );
END
$func$;
  • The trigger function must be created before the trigger.

  • A simple DELETE command can do the job. No second query needed - in particular no count(*). EXISTS is cheaper.

  • Single-quotes around the language name are tolerated, but it's an identifier really, so just omit the nonsense: LANGUAGE plpgsql

CREATE TRIGGER label_post_delaft_kill_orphaned_label
AFTER DELETE ON label_post
FOR EACH ROW EXECUTE PROCEDURE f_trg_kill_orphaned_label();

There is no CREATE OR REPLACE TRIGGER in PostgreSQL, yet. Just CREATE TRIGGER.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • You should prefer Erwin's implementation of the trigger, as he does everything in one atomic statement which is always good (not only for being faster). – Priidu Neemre Mar 23 '13 at 02:07
  • @Priidu .. also minimizing the chance for a race condition. :) – Erwin Brandstetter Mar 23 '13 at 02:07
  • Wow, that's a great lot of good info! I really appreciate your insight, I didn't get to receive practical advices like that from class. – AntoineG Mar 23 '13 at 02:25
  • @ErwinBrandstetter great answer! I kinda not agree when you said `Use PostgreSQL locally, too. It's freely available for most every OS` Usually developers do *not* and should not install the full DBMS on their server, too much maintenance and loss of resources. The proper way of doing this is to use an abstraction like ORM to will hide all this SQL stuff from the user. In my previous projects I almost always use H2 Memory Store locally and production oracle/mssql/postgres. Allowed rapid development and easy dev setup. – Mohamed Mansour Mar 23 '13 at 07:20
  • 9
    @MohamedMansour: Well, I disagree. ORMs are primitive crutches that regularly fail to get the full potential out of your RDBMS. With Oracle, it might make sense to reduce costs, but that doesn't apply with PostgreSQL. Even if you work with an ORM, there is nothing to gain from using different RDBMS. You can only lose. You need a query to be fast and decide to go with a certain query, because it delivers best performance? Guess what? Not true on the productive DB ... Develop with the same RDBMS if you can! That's a no-brainer. – Erwin Brandstetter Mar 23 '13 at 07:43
  • In 2018 running a PostgreSQL instance on a development VM or machine is fine imo. mysql out of the box uses about 256MB and most people have 2GB+. Much better than running two different dbs in test/production or crippling yourself by using an ORM. – rjh Oct 26 '18 at 10:15
4

One way to achieve the behaviour you seek (delete unused labels from the database) would be to use triggers.

You could try writing something like:

CREATE OR REPLACE TRIGGER tr_LabelPosts_chk_no_more_associated_posts 
AFTER DELETE ON LabelPosts 
FOR EACH ROW 
EXECUTE PROCEDURE f_LabelPosts_chk_no_more_associated_posts();


CREATE OR REPLACE FUNCTION f_LabelPosts_chk_no_more_associated_posts() 
RETURNS TRIGGER AS $$
DECLARE
    var_associated_post_count INTEGER;
BEGIN
    SELECT Count(*) AS associated_post_count INTO var_associated_post_count FROM LabelPosts WHERE labelId = OLD.labelId;
    IF(var_associated_post_count = 0) THEN
        DELETE FROM Labels WHERE labelId = OLD.labelId;
    END IF;
END
$$ LANGUAGE 'plpgsql';

Basically, what happens here is:

  1. A row is deleted from table Posts.
  2. The deletion is cascaded to all associated rows in LabelPosts (thanks to your foreign key constraint).
  3. After the deletion of every single row in LabelPosts the trigger is activated, which in turn calls the PostgreSQL function.
  4. The function checks whether there are any other posts connected with the labelId in question. If so, then it finishes without any further modification. However, if there aren't any other rows in the relationship table, then the label is not used elsewhere and can thus be deleted.
  5. The function executes a delete DML on the Labels table, effectively removing the (now) unused label.

Obviously the naming isn't the best and there must be a ton of syntax errors in there, so see here and here for more information. There may be better ways to taking this thing down, however at the moment I can't think of a fast method that would not destroy the nice generic-looking table structure.

Although bare in mind - it is not generally a good practice to overburden your database with triggers. It makes every associated query/statement run a tat slower & also makes administration considerably more difficult. (Sometimes you need to disable triggers to perform certain DML operations, etc. depending on the nature of your triggers).

Priidu Neemre
  • 2,813
  • 2
  • 39
  • 40