14

I'm trying to make a blog system of sort and I ran into a slight problem.

Simply put, there's 3 columns in my article table:

id SERIAL,
category VARCHAR FK,
category_id INT

id column is obviously the PK and it is used as a global identifier for all articles.

category column is well .. category.

category_id is used as a UNIQUE ID within a category so currently there is a UNIQUE(category, category_id) constraint in place.

However, I also want for category_id to auto-increment.

I want it so that every time I execute a query like

INSERT INTO article(category) VALUES ('stackoverflow');

I want the category_id column to be automatically be filled according to the latest category_id of the 'stackoverflow' category.

Achieving this in my logic code is quite easy. I just select latest num and insert +1 of that but that involves two separate queries. I am looking for a SQL solution that can do all this in one query.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
yhware
  • 502
  • 3
  • 13
  • 26
  • @a_horse_with_no_name lol I completely forgot about that. Can't believe I made that mistake yet again after countless wasted hours. – yhware Jan 02 '16 at 23:51

3 Answers3

9

This has been asked many times and the general idea is bound to fail in a multi-user environment - and a blog system sounds like exactly such a case.

So the best answer is: Don't. Consider a different approach.

Drop the column category_id completely from your table - it does not store any information the other two columns (id, category) wouldn't store already.

Your id is a serial column and already auto-increments in a reliable fashion.

If you need some kind of category_id without gaps per category, generate it on the fly with row_number():

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • If nothing else really depends on that `category_id` being stored inside a table, then is there a problem with race conditions? To my understanding there is only such a problem if some unpredictable actions may happen in succession. I mean, considering transactions, could it be that there will be two exact values for a given `category`? – Kamil Gosciminski Jan 03 '16 at 22:29
  • @ConsiderMe: Yes, exactly that can (and will) happen. There is a time gap between selecting the current maximum and committing the transaction in which a new row with maximum+1 has been inserted. During this time, any number of concurrent sessions will see the same maximum and consequently try to insert the same number, producing duplicates or raising exceptions with unique violations if you have a unique constraint. – Erwin Brandstetter Jan 03 '16 at 22:53
  • Thank you for the reply. Basically one could handle it by making a unique constraint and doing an upsert, am I right? The second option would be to have `Read uncommitted` isolation level? __This comment is only for the purpose of handling such cases, not stating the point that should be followed.__ – Kamil Gosciminski Jan 03 '16 at 22:58
  • 1
    @ConsiderMe: UPSERT of pg 9.5 offers an alternative action in case of unique violations. [I just wrote a related answer.](http://stackoverflow.com/a/34580928/939860) But the alternative can still fail, and it does *not* solve the concurrency issues at hand. `read uncommitted` would reduce the window for some race conditions, but *not* solve the issue either, and introduce a new problem with [phantom reads](https://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Phantom_reads). `serializable` would solve the problem - at the cost of major performance loss and serialization failures. – Erwin Brandstetter Jan 03 '16 at 23:15
  • If you insist on maintaining a `category_id`, the best course of action would be to have a FK constraint on `category` to a `category` table and take a row level lock on the category before adding a new child row. There was a related answer along that line, but I can't find it ATM. I wouldn't go there *at all*, that's why my answer revolves around **"Don't."** – Erwin Brandstetter Jan 03 '16 at 23:19
  • Acquiring a row-level lock would do the trick. Thanks again for some additional clarification :) – Kamil Gosciminski Jan 03 '16 at 23:23
8

Concept

There are at least several ways to approach this. First one that comes to my mind:

Assign a value for category_id column inside a trigger executed for each row, by overwriting the input value from INSERT statement.

Action

Here's the SQL Fiddle to see the code in action


For a simple test, I'm creating article table holding categories and their id's that should be unique for each category. I have omitted constraint creation - that's not relevant to present the point.

create table article ( id serial, category varchar, category_id int )

Inserting some values for two distinct categories using generate_series() function to have an auto-increment already in place.

insert into article(category, category_id)
  select 'stackoverflow', i from generate_series(1,1) i
  union all
  select 'stackexchange', i from generate_series(1,3) i

Creating a trigger function, that would select MAX(category_id) and increment its value by 1 for a category we're inserting a row with and then overwrite the value right before moving on with the actual INSERT to table (BEFORE INSERT trigger takes care of that).

CREATE OR REPLACE FUNCTION category_increment()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
DECLARE
  v_category_inc int := 0;
BEGIN
  SELECT MAX(category_id) + 1 INTO v_category_inc FROM article WHERE category = NEW.category;
  IF v_category_inc is null THEN
    NEW.category_id := 1;
  ELSE
    NEW.category_id := v_category_inc;
  END IF;
RETURN NEW;
END;
$$ 

Using the function as a trigger.

CREATE TRIGGER trg_category_increment 
  BEFORE INSERT ON article 
  FOR EACH ROW EXECUTE PROCEDURE category_increment()

Inserting some more values (post trigger appliance) for already existing categories and non-existing ones.

INSERT INTO article(category) VALUES 
  ('stackoverflow'),
  ('stackexchange'),
  ('nonexisting');

Query used to select data:

select category, category_id From article order by 1,2

Result for initial inserts:

category    category_id
stackexchange   1
stackexchange   2
stackexchange   3
stackoverflow   1

Result after final inserts:

category    category_id
nonexisting     1
stackexchange   1
stackexchange   2
stackexchange   3
stackexchange   4
stackoverflow   1
stackoverflow   2
Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • I think you could use `FOUND` special variable available in Postgres (instead of checking if variable is null), but SQLFiddle seems to be picky on that, or I'm missing something. – Kamil Gosciminski Jan 02 '16 at 22:37
  • This looks promising! Thank you :) – yhware Jan 03 '16 at 00:00
  • 1
    Don't use this in a multi-user environment. There are unpredictable race conditions. – Erwin Brandstetter Jan 03 '16 at 03:16
  • Erwin is right about that. I wrote the answer merely to show approach to get the job done which you asked about. You could achieve the same thing by using window functions and ordering your partitions by a serial column. This way you'd always calculate those sequences on-the-fly without storing them. – Kamil Gosciminski Jan 03 '16 at 22:25
  • @Kamil G. oh, thank you for your solution. It helped me with my task! – Sviatlana Sep 08 '17 at 12:39
-2

Postgresql uses sequences to achieve this; it's a different approach from what you are used to in MySQL. Take a look at http://www.postgresql.org/docs/current/static/sql-createsequence.html for complete reference.

Basically you create a sequence (a database object) by:

CREATE SEQUENCE serials;

And then when you want to add to your table you will have:

INSERT INTO mytable (name, id) VALUES ('The Name', NEXTVAL('serials')
Darko Maksimovic
  • 1,155
  • 12
  • 14
  • AAAh I get it. So this looks like a pretty neat problem. If I understood is correctly, I need to create a sequence for all the categories right? If so, is there anyway to do this automatically whenever a new category is added? – yhware Jan 02 '16 at 22:14
  • 1
    I wouldn't really recommend following the approach in this answer. This would require your application logic to do the distinction between categories and manage sequences, and also that doesn't seem like a good design to have multiple sequences for one column. – Kamil Gosciminski Jan 02 '16 at 22:17
  • oh... in that case could u recommend a better approach? Even just a search keyword would be appreciated :) – yhware Jan 02 '16 at 22:22
  • In correction to what I've written earlier: You could also do this distinction inside a trigger, but then there is the cost of maintaining it or executing more complex queries to do so. – Kamil Gosciminski Jan 02 '16 at 22:40
  • 1
    That's what the existing [`serial`](http://www.postgresql.org/docs/current/interactive/datatype-numeric.html#DATATYPE-SERIAL) column `id` already does automatically. Your answer would just interfere with it in unhelpful ways. – Erwin Brandstetter Jan 03 '16 at 03:19