Do I need a primary key for my table, which has a UNIQUE (composite 4-columns), one of which can be NULL?

Question

I have the following table (PostgreSQL 8.3) which stores prices of some products. The prices are synchronised with another database, basically most of the fields below (apart from one) are not updated by our client - but instead dropped and refreshed every once-in-a-while to sync with another stock database:

CREATE TABLE product_pricebands (
    template_sku varchar(20) NOT NULL,
    colourid integer REFERENCES colour (colourid) ON DELETE CASCADE,        
    currencyid integer NOT NULL REFERENCES currency (currencyid) ON DELETE CASCADE,
    siteid integer NOT NULL REFERENCES site (siteid) ON DELETE CASCADE,

    master_price numeric(10,2),

    my_custom_field boolean, 

    UNIQUE (template_sku, siteid, currencyid, colourid)
);

On the synchronisation, I basically DELETE most of the data above except for data WHERE my_custom_field is TRUE (if it's TRUE, it means the client updated this field via their CMS and therefore this record should not be dropped). I then INSERT 100s to 1000s of rows into the table, and UPDATE where the INSERT fails (i.e. where the combination of (template_sku, siteid, currencyid, colourid) already exists).

My question is - what best practice should be applied here to create a primary key? Is a primary key even needed? I wanted to make the primary key = (template_sku, siteid, currencyid, colourid) - but the colourid field can be NULL, and using it in a composite primary key is not possible.

From what I read on other forum posts, I think I have done the above correctly, and just need to clarify:

1) Should I use a "serial" primary key just in case I ever need one? At the moment I don't, and don't think I ever will, because the important data in the table is the price and my custom field, only identified by the (template_sku, siteid, currencyid, colourid) combination.

2) Since (template_sku, siteid, currencyid, colourid) is the combination that I will use to query a product's price, should I add any further indexing to my columns, such as the "template_sku" which is a varchar? Or is the UNIQUE constraint a good index already for my SELECTs?

colourid is nullable. That makes it a nasty member for a primary key. — wildplasser, May 09 '12 at 13:06
OK, here's the simple answer. If you have a table with cols a,b,c,d and d is nullable, can you live with rows that have the same a,b and c fields? If so, then what you've got is fine. If you need to tell them apart, then you need a unique index on a,b,c or a partial index on a,b,c where d is null. — Scott Marlowe, May 10 '12 at 02:22

score 12 · Accepted Answer · edited Apr 13 '17 at 12:42

Should I use a "serial" primary key just in case I ever need one?

You can easily add a serial column later if you need one:

ALTER TABLE product_pricebands ADD COLUMN id serial;

The column will be filled with unique values automatically. You can even make it the primary key in the same statement (if no primary key is defined, yet):

ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY;

If you reference the table from other tables I would advise to use such a surrogate primary key, because it is rather unwieldy to link by four columns. It is also slower in SELECTs with JOINs.

Either way, you should define a primary key. The UNIQUE index including a nullable column is not a full replacement. It allows duplicates for combinations including a NULL value, because two NULL values are never considered the same. This can lead to trouble.

As

the colourid field can be NULL

you might want to create two unique indexes. The combination (template_sku, siteid, currencyid, colourid) cannot be a PRIMARY KEY, because of the nullable colourid, but you can create a UNIQUE constraint like you already have (implementing an index automatically):

ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx
UNIQUE (template_sku, siteid, currencyid, colourid)

This index perfectly covers the queries you mention in 2).
Create a partial unique index in addition if you want to avoid "duplicates" with (colourid IS NULL):

CREATE UNIQUE INDEX product_pricebands_uni_null_idx
ON product_pricebands (template_sku, siteid, currencyid)
WHERE colourid IS NULL;

To cover all bases. I wrote more about that technique in a related answer on dba.SE.

The simple alternative to the above is to make colourid NOT NULL and create a primary key instead of the above product_pricebands_uni_idx.

Also, as you

basically DELETE most of the data

for your refill operation, it will be faster to drop indexes, that are not needed during the refill operation, and recreate those afterwards. It is faster by an order of magnitude to build an index from scratch than to add all rows incrementally.

How do you know, which indexes are used (needed)?

Test your queries with EXPLAIN ANALYZE.
Or use the built-in statistics. pgAdmin displays statistics in a separate tab for the selected object.

It may also be faster to select the few rows with my_custom_field = TRUE into a temporary table, TRUNCATE the base table and re-INSERT the survivors. Depends on whether you have foreign keys defined. Would look like this:

CREATE TEMP TABLE pr_tmp AS
SELECT * FROM product_pricebands WHERE my_custom_field;

TRUNCATE product_pricebands;
INSERT INTO product_pricebands SELECT * FROM pr_tmp;

This avoids a lot of vacuuming.

Hi, thanks so much for this, extremely useful from many angles! Here are my comments from what I have tried so far: - I added a serial column just to make it easy to edit/delete. - By surrogate primary key, did you mean the serial key? I do need to join this price table to other product-related tables on the product list and detail pages on the website, and so I would need to query by (template_sku, siteid, currencyid, colourid). — rishijd, May 10 '12 at 11:06
- Colourid can be null really for only two products out of 100s. These two products are not associated with colour in the system. Still, I think I need this field to be null, since there is no point adding a special table just for these two products. Also, I added a partial index as you said, "CREATE UNIQUE INDEX product_pricebands_uni_null_idx". My question now is what you said about the "deletion of data". I tried an EXPLAIN ANALYZE on an INSERT - it doesn't mention any of the unique indexes, but only the FKs: e.g. Trigger for constraint product_pricebands_colourid_fkey: time=0.238 calls=1 — rishijd, May 10 '12 at 11:09
Trigger for constraint product_pricebands_currencyid_fkey: Trigger for constraint product_pricebands_siteid_fkey: -- Therefore, should I go ahead and drop the indexes before the DELETE, and re-insert the INDEXES after? Do I re-insert the indexes before or after I INSERT/UPDATE all the data? I'm confused on whether the indexes actually re-index themselves or if I should actually drop and add them back at the beginning and end (respectively) of each synchronisation bulk insert/update? — rishijd, May 10 '12 at 11:13
@rishijd: Yes, the serial column is a [surrogate primary key](http://en.wikipedia.org/wiki/Surrogate_key). Indexes are always kept up to date automatically. Dropping & recreating increases performance, *if* you insert most or all of the data in a bulk operation. You can do the same with foreign key constraints unless you need data integrity to be verified *during* the INSERT. Recreate indexes and fk constraints *after* bulk-INSERT is complete. — Erwin Brandstetter, May 10 '12 at 14:34
Just want to say thanks so much for all your help :) FYI I'm keeping the indexes in tact, but your replies have been really insightful. — rishijd, Jun 06 '12 at 11:47

Do I need a primary key for my table, which has a UNIQUE (composite 4-columns), one of which can be NULL?

1 Answers1

Linked