1

I have an additional question related to post: Composite primary keys in databases

Please check out the post over there (otherwise I just need to repeat it).

Now my question is: If I go for an ID autoincrement as primary key (as I accepted and which allows me to again reference the current table through this key), how can I assure that a combination between User_ID and Admin_ID (both FK's) can only exist once (is unique) on insert? It is a many-to-many relationship.
It could be done in the programming of the front end (check for existing record by select), however my feelings tell me that this is not the best way and I wonder if I can directly put the restriction in the back end.

I would logically add the FK's to the primary key, but then I'm back to the composite key and this is what I was generally advised rather not to use.
What is the proper way of doing this?

Thanks for your help.

Community
  • 1
  • 1
html_programmer
  • 18,126
  • 18
  • 85
  • 158
  • Should I just put a unique constraint on the two foreign keys? Or should I create a unique index for those two columns? As you notice, I am still in a learning process about databases. The result of my work will be verified by an experienced designer, so there is no danger – html_programmer Sep 13 '12 at 13:45
  • 1
    _"Should I just put a unique constraint on the two foreign keys?"_ Yes. _"Or should I create a unique index for those two columns?"_ On most DBMSes, creating a key (which is what UNIQUE constraint is, assuming no NULLs) will automatically create an index "underneath" it, so it can be efficiently enforced. Curiously enough, a key [can (and in some cases must)](http://richardfoote.wordpress.com/2008/06/04/primary-keys-and-non-unique-indexes-whats-really-happening/) be supported by a non-unique index. – Branko Dimitrijevic Sep 13 '12 at 19:14

4 Answers4

3

how can I assure that a combination between User_ID and Admin_ID (both FK's) can only exist once (is unique) on insert?

Create a composite key.

It could be done in the programming of the front end (check for existing record by select)

It couldn't, unless you are the only client accessing the table.1

In a real, concurrent environment, you can never know if another transaction inserted the same value (as you are trying to insert) after your SELECT but before your INSERT.

And even if you were the only one accessing the database, you'd need an index for efficient execution of SELECT anyway. So why not let the DBMS utilize this index for a key?

I would logically add the FK's to the primary key, but then I'm back to the composite key and this is what I was generally advised rather not to use.

Wrong advice. If a column or a combination of columns has to be unique, you have to create a key.2 You cannot skip creating a key that enforces the correctness of your data just because you have another (surrogate) key.

Surrogates cannot generally replace natural keys, they can only be added.3 So the question becomes: is the additional overhead of the surrogate worth it? Sometimes it is, sometimes it isn't, but there are no cut-and-dry rules here.


1 Or are willing to lock the whole table, destroying the scalability in the process.

2 Though it doesn't necessarily need to be primary.

3 What they typically "replace" is the natural key's role as a primary key, but the natural key still continues to live as alternate key.

Community
  • 1
  • 1
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • Thanks for advice. I accepted a reply from my previous post that made me decide not to go for the composite key as primary key: http://stackoverflow.com/questions/12387695/composite-primary-keys-in-databases/12388102#12388102 They explain the reasons why I chose to go for the auto-increment as primary key. But I could make the constraint on the foreign key combination as you confirmed in one of your previous comments. Would that seem acceptable? – html_programmer Sep 14 '12 at 07:30
  • I understand your point about the Select / Insert. My gut told me that doing it this way was wrong because there seemed too many holes in the idea. You explained the reasons clearly for which thanks -> This is certainly not an option for me anymore. – html_programmer Sep 14 '12 at 07:32
  • @KimGysen Your choice is is basically: (1) a natural PK (which is a composite of these two FK fields) or (2) a surrogate PK and a natural AK. This choice will be a matter of balance of criteria I mentioned in my link... – Branko Dimitrijevic Sep 14 '12 at 08:43
  • Thank you Branko, I decided to go for the second option: Surrogate primary key combined with a natural alternate key. As I perceive it, it is a disputable topic with pros / cons; but it seems to be a pretty good practical option based on the replies I received on my previous post. – html_programmer Sep 14 '12 at 09:03
1

I would go for a composite key

If I really need an auto-increment primary key, then I'll create a unique index on the two foreign key columns

Side note: The advantage of using a composite key is that when using an ORM tool like Entity Framework, it automatically recognizes it as a many to many relationship, and abstracting away the additional intersection table into just a relationship.

Jeow Li Huan
  • 3,758
  • 1
  • 34
  • 52
  • Héhé, hmm it's sounds logical. However, they advised me not to use the composite key. Is creating a unique index better/faster than creating an unique constraint? – html_programmer Sep 13 '12 at 13:21
  • My gut feeling says that composite key is faster, since I'm always doing inner joins on those columns. The auto-increment primary key serves no purpose other than reducing clustered index fragmentation, but doesn't reduce the unique index fragmentation... – Jeow Li Huan Sep 13 '12 at 13:26
  • If I'm not wrong, unique constraint is achieved by using an index – Jeow Li Huan Sep 13 '12 at 13:27
  • I see your point. However, the auto-increment may be useful to easily lookup and reference (I read in point 3 of the reply I accepted in the other post that I linked above) and I'm a little bit hesitant to use the composite primary key after the previous post. So you suggest that I could use either the unique index or the unique constraint on the two columns. – html_programmer Sep 13 '12 at 13:46
1

Consider the following (hypothetical) schema. Would you add a surrogate key to the "chessboard" table ? The values for {xxx,yyy,pc} are constrained, either by "restricted" domains, or by the PK+FK constraints.

In which cases would adding a surrogate key (for {xxx,yyy} help?

(additional constraints (such as: no more than one King per Color ...) would be necessary for a real chess game, (but part of) the business rules (such as valid moves ...) would be handled by the "application logic" anyway)

-- this is Postgres-specific:
-- create a schema to play in
DROP SCHEMA chess CASCADE ;
CREATE SCHEMA chess ;
SET search_path='chess' ;

        -- Domain with only values A-H to three allowed.
CREATE DOMAIN chess_column
        AS CHAR(1) NOT NULL
        check (value >= 'A' AND value <= 'H')
        ;
        -- Domain with only values 1-8 allowed.
CREATE DOMAIN chess_row
        AS INTEGER NOT NULL
        check (value >= 1 AND value <= 8)
        ;
        -- Table with only valid pieces
CREATE TABLE chess_piece
        ( id INTEGER NOT NULL PRIMARY KEY
        , pname varchar
        ) ;
INSERT INTO chess_piece(id,pname) VALUES
 ( -6, 'Black King' ) , ( -5, 'Black Queen' ) , ( -4, 'Black Rook' )
, ( -3, 'Black Bishop' ) , ( -2, 'Black Knight' ) , ( -1, 'Black Pawn' )
, ( 6, 'White King' ) , ( 5, 'White Queen' ) , ( 4, 'White Rook' )
, ( 3, 'White Bishop' ) , ( 2, 'White Knight' ) , ( 1, 'White Pawn' )
        ;

CREATE TABLE chessboard
        ( xxx chess_column
        , yyy chess_row
        , pc INTEGER NOT NULL REFERENCES chess_piece(id)
        , PRIMARY KEY (xxx,yyy)
        );
        -- Too lazy to enter the entire board
        -- ; only put a White Pawn at E2
INSERT INTO chessboard(xxx,yyy,pc)
SELECT 'E', 2, p.id
FROM chess_piece p
WHERE p.pname = 'White Pawn';
        ;
        -- Shift the pawn
UPDATE chessboard b
SET yyy = 4
FROM chess_piece p
WHERE b.pc = p.id
AND p.pname = 'White Pawn';
AND b.xxx = 'E' AND b.yyy = 2
        ;
        -- Try to put a piece outside the board
\echo Try put a piece outside the board
INSERT INTO chessboard(xxx,yyy,pc)
SELECT 'I', 2, p.id
FROM chess_piece p
WHERE p.pname = 'Black Pawn';
        ;
        -- add a non-existing piece
\echo add a non-existing piece
INSERT INTO chessboard(xxx,yyy,pc)
VALUES( 'H', 1, 42)
        ;
        -- Position is already occupied
\echo Position is already occupied
INSERT INTO chessboard(xxx,yyy,pc)
SELECT 'E', 4, p.id
FROM chess_piece p
WHERE p.pname = 'Black Pawn';
        ;
wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • I think I understand your point. However, our application is less static than a chess game, hence also my concern. One of the comments in the other posts: "Compound primary keys make your life harder - all references from child tables and all joins between those tables will always need all the columns of the compound PK - which can get quite messy rather quickly. I don't see anything wrong with using an auto-increment ID column - other than it introduces a tiny bit of extra "non-productive" data into your model" If I want to create child tables later on, I don't want any trouble. – html_programmer Sep 14 '12 at 07:59
0

Go with the ID. I strongly agree with the answer on the other page. For quick-and-dirty applications, a composite key is fine. However, I generally put auto-incremented ids into new tables that I create, unless they are static tables used for reporting.

For your specific question, there are at least four answers that I can think of off-hand:

  1. Implement a constraint to so the two fields are never duplicated.
  2. Create a unique index on the two columns.
  3. Implement a trigger on the table to check for duplicates.
  4. Do inserts via a stored procedure that checks for validity on the data.

My preference is (4), in combination with (1) or (2). I find that controlling inserts through stored procedures gives me a lot of flexibility, particularly when I want to log or debug problems. That said, I am not usually working with high-volume transactional systems, where reducing overhead is paramount.

There is one advantage to an auto-incrementing id missed on the other answer. The following query:

select *
from t
order by 1 desc

Returns the most recently added records, assuming the id is the first column (as it is in all my tables). Just the ability to see the most recently inserted records is sufficient for me to use an id.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Many thanks for your reply and so I see. Just a question: What do you mean with "in combination"? If you implement 1 or 2, doesn't that make checking for validity (if you mean with that, checking for duplicates) superfluous? Won't the insert automatically fail when applying 1 or 2? – html_programmer Sep 13 '12 at 15:10
  • 3 and 4 cannot be done efficiently **and** correctly in a concurrent environment. If a column or a combination of columns needs to be unique, pretty much the only practical recourse is to use a key (be it primary or alternate). As for 1 and 2, I'm not quite sure what you mean by "constraint" (UNIQUE constraint?) and why you used term "index" instead of "key"? – Branko Dimitrijevic Sep 13 '12 at 18:39