1

I'm currently PHP building a script that has to update statistics when it finishes it's purpose. The script is accessed by web browser and depending on the traffic it may be executed simultaneously. I have to guarantee the statistics are right.

To give you the picture let's say we have a table:

CREATE TABLE statistics(
  user_id      integer NOT NULL,
  date         integer NOT NULL, -- for unix time
  stat1        integer NOT NULL DEFAULT 0,
  stat2        integer NOT NULL DEFAULT 0,
  stat3        integer NOT NULL DEFAULT 0  -- and so on...
);

-- Let's insert some testing data for a couple of users and days...
-- Day one
INSERT INTO statistics(1, 1303520820, 1, 1, 1);
INSERT INTO statistics(2, 1303520820, 1, 1, 1);
-- Day two
INSERT INTO statistics(1, 1303603200, 1, 1, 1);
INSERT INTO statistics(2, 1303603200, 1, 1, 1);
-- Day three
INSERT INTO statistics(1, 1303689600, 1, 1, 1);
INSERT INTO statistics(2, 1303689600, 1, 1, 1);

Every day a new row is inserted in the table so we could have a daily, weekly, monthly, yearly statistics. I have to be sure only one row is inserted per user_id per day. Also whenever an UPDATE query is executed it would increment the columns stat1,stat2,stat3 appropriately.

This script is expected to have quite some traffic and I want to figure out how to make things work when the script is executed and has a couple of instances working simultaneously. What method/technique do you find best for such tasks ?

tftd
  • 16,203
  • 11
  • 62
  • 106

3 Answers3

3

The simplest solution is to add a unique constraint

CREATE TABLE statistics(
  user_id      integer NOT NULL,
  date         integer NOT NULL, -- for unix time
  stat1        integer NOT NULL DEFAULT 0,
  stat2        integer NOT NULL DEFAULT 0,
  stat3        integer NOT NULL DEFAULT 0,  -- and so on...
  UNIQUE(user_id,date)
);

You should definitely do this regardless of what other measures you take.

Paul Creasey
  • 28,321
  • 10
  • 54
  • 90
  • Yes I though about that. Most probably I'll add UNIQUE only to the date column. I can't add it to the user_id column as the table can have endless rows with the same user_id :) – tftd Apr 22 '11 at 23:09
  • On second though I can't add Unique constraint to the structure. See the example in the question. The user_id can be the same and the date can be the same. Every user gets a new record in the database for every day. – tftd Apr 22 '11 at 23:35
  • 2
    `UNIQUE(user_id,date)` means that every combination of user_id and date must be unique, it is NOT the same as `UNIQUE(date),UNIQUE(user_id)`, i.e. there can only be 1 row per user per date, rather than 1 row per user and 1 row per date. – Paul Creasey Apr 23 '11 at 00:16
  • The OP will also need a trigger, I think. The trigger would determine whether to insert or update a row, and to implement the arithmetic. – Mike Sherrill 'Cat Recall' Apr 23 '11 at 00:34
  • @catcall, i would use a procedure not a trigger for that – Paul Creasey Apr 23 '11 at 00:41
  • I was talking about a trigger calling a procedure. A bare procedure can be bypassed by application code or a sleep-deprived DBA. – Mike Sherrill 'Cat Recall' Apr 23 '11 at 00:43
  • Not sure whether you're joking, but a unique constraint doesn't guarantee that all inserts and updates will go through the procedure. With suitable REVOKEs on the base tables, a procedure can protect the table from application code that exercises bad judgement. But you need more than that to protect the table from a DBA that's using bad judgement. Of course, it could be that I just don't understand what you mean. – Mike Sherrill 'Cat Recall' Apr 23 '11 at 01:20
  • @Catcall, i mean that a unique constraints prevent you insterting duplicates which is the point, using triggers to enforce business logic in case a dba does something wrong is a bit crazy in my opinion!! – Paul Creasey Apr 23 '11 at 04:11
  • The OP's problem isn't just an issue of missing a primary key or unique constraint. The OP also needs to either insert (if the key doesn't exist) or update (if the key does exist). – Mike Sherrill 'Cat Recall' Apr 23 '11 at 05:11
1

Also you can add CHECK for date value to ensure that it a multiple of 1 day:

ALTER TABLE "statistics" ADD CONSTRAINT "1day_quantum" CHECK ("date" = ("date" / 86400)::INTEGER * 86400);

Then exception will be thrown if one try insert wrong value of date.

If date field type will be TIMESTAMP or TIMESTAMPTZ then CHECK is more complicated:

ALTER TABLE "statistics" ADD CONSTRAINT "1day_quantum" CHECK ("date" = TIMESTAMP 'epoch' + ((EXTRACT(EPOCH FROM "date") / 86400)::INTEGER * 86400) * INTERVAL '1 second');

By changing 86400 (seconds count) you can adjust constraint to various quantum: 900 for 15 mins for example.

1

You need a unique constraint on the pair user_id and date, as others have said.

In order to insert without doing arithmetic when the compound key (user_id, date) doesn't exist, and to update with arithmetic when the compound key does exist, you need to write some code. Informally, this is called an "upsert". There's more than one way.

PosgreSQL docs have an example of a function that implements this kind of requirement using exception handling. The problem with a function is that you can't force application code or database girls to use it every time without exception.

You can (I think) use the suppress_redundant_updates_trigger(). The advantage of triggers is that they can't be bypassed accidentally by application code or by database girls. I have not used this technique myself, so I can't comment further on it. This trigger is documented here.

You can also handle the upsert logic with a user-defined trigger.

Community
  • 1
  • 1
Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185