5

The problem domain I'm working in is returns management for eCommerce.

I'm using Postgres (11.9) and have the following tables (I've removed a number of fields from each table that aren't relevant to the question):

CREATE TABLE "order" (
    id BIGSERIAL PRIMARY KEY,
    platform text NOT NULL,
    platform_order_id text NOT NULL,
    CONSTRAINT platform_order_id_unique UNIQUE (platform, platform_order_id)
);

CREATE TABLE order_item (
    id BIGSERIAL PRIMARY KEY,
    order_id int8 NOT NULL,
    platform_item_id text NOT NULL,
    quantity integer,
    CONSTRAINT FK_order_item_order_id FOREIGN KEY (order_id) REFERENCES "order",
    CONSTRAINT platform_item_id_unique UNIQUE (order_id, platform_item_id)
);

CREATE TABLE return (
    id BIGSERIAL PRIMARY KEY,
    order_id int8 NOT NULL,
    CONSTRAINT FK_return_order_id FOREIGN KEY (order_id) REFERENCES "order"
);

CREATE TABLE return_item (
    return_id int8 NOT NULL,
    order_item_id int8 NOT NULL,
    quantity integer NOT NULL,
    CONSTRAINT FK_return_item_return_id FOREIGN KEY (return_id) REFERENCES return,
    CONSTRAINT FK_return_item_item_id FOREIGN KEY (order_item_id) REFERENCES order_item
);

To explain the domain briefly, I pull orders from eCommerce platforms and store them in my database. An order is comprised of one or more distinct items that has a quantity > 1. When a user wishes to return an item, they may return up to the quantity per return.

Put more concretely, if I buy two black small t-shirts in a single order, you would find an order in the database with a single order_item that has a quantity of 2. I would have the ability to create two separate returns, each return with one return_item referencing the same order_item_id but with a quantity of 1.

order_item and return_item are inserted in different transactions and I'm not preventing multiple transactions from updating either of those at the same time.

How can I ensure that the summed value of each quantity for all return_item with a particular order_item_id does not exceed the quantity stored in the corresponding order_item for with said id?

Put in more plain English, how do I prevent a third item from being returned when the quantity of that item in the original order was 2 as in the example I described?

It's easy enough to write an application check to catch this in most cases, and also not difficult to add a business rule checking WHERE clause to my return_item insertions, but neither of these solutions gives me the consistency guarantees that uniqueness constraint does. How would I go about writing a trigger to error on insert here? Or is there a better approach than a trigger?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
mistahenry
  • 8,554
  • 3
  • 27
  • 38

3 Answers3

2

You ask for a trigger solution specifically. For the record, you can achieve the same with plain SQL, too, as long as you can make sure that all clients use the necessary statements. Related example:

Trigger solution

You mentioned that concurrent write access is possible. That makes it more complex. For example, two transactions might try to return an item from the same order_item at the same time. Both check and find that one more item can be returned and do so, thereby exceeding the number of order_item.quantity by 1. Classical concurrency caveat.

To defend against it, you might use SERIALIZABLE transaction isolation. But that's considerably more expensive, and all transactions that might write to involved tables have to stick to it.

Alternatively, take out strategic row locks in default READ COMMITTED isolation level. Here is a basic implementation:

Trigger function:

CREATE FUNCTION trg_return_item_insup_bef()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
DECLARE
   _ordered_items int;
   _remaining_items int;
BEGIN
   SELECT quantity
   FROM   order_item
   WHERE  id = NEW.order_item_id
   FOR    NO KEY UPDATE                -- lock the parent row first ... (!!!)
   INTO   _ordered_items;              --  ... while fetching quantity

   SELECT _ordered_items - COALESCE(sum(quantity), 0)
   FROM   return_item
   WHERE  order_item_id = NEW.order_item_id
   INTO   _remaining_items;

   IF NEW.quantity > _remaining_items THEN
      RAISE EXCEPTION 'Tried to return % items, but only % of % are left.'
                     , NEW.quantity, _remaining_items, _ordered_items;
   END IF;
   
   RETURN NEW;
END
$func$;

Trigger:

CREATE TRIGGER insup_bef
BEFORE INSERT OR UPDATE ON return_item
FOR EACH ROW
EXECUTE PROCEDURE trg_return_item_insup_bef();

db<>fiddle here

Any attempt to return items locks the parent row in order_item first. Competing transactions have to wait until this one is committed - and will then see the newly committed rows. That eliminates the race condition. FOR NO KEY UPDATE is the right lock strength. Neither to weak nor too strong.

Writes to order_item can also interfere with item totals. But those also take out a write lock (implicitly) and are forced to queue in the same fashion. But if later updates to the order_item.quantity are possible, you'll have to add similar checks in a trigger there (in case it's lowered).

I added basic info to the error message raised when a quantity is exceeded. You might put more or less info there.

The example setup could be optimized. "order" is a reserved word. The table return is useless in the example, as well as return_item.return_id. PK is missing in return_item. order_item.quantity should be NOT NULL CHECK (quantity > 0). COALESCE in the trigger function is redundant in a proper implementation. But these are secondary notes.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

The only solution I can think of is denormalization.

Add an integer column total_returns to order_item that is modified by a trigger on return_item whenever rows are added or deleted or the quantity changes.

Then you can have a simple check constraint on order_item that makes sure that your invariant holds.

Some sample code:

BEGIN;

/* for consistency */
ALTER TABLE order_item
   ALTER quantity SET NOT NULL
   ALTER quantity SET DEFAULT 0;

ALTER TABLE order_item
   ADD total_returns bigint DEFAULT 0 NOT NULL;

ALTER TABLE order_item
   ADD CONSTRAINT not_too_many_returns
      CHECK (total_returns <= quantity);

/* trigger function */
CREATE FUNCTION requrn_order_trig() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
      UPDATE order_item
      SET total_returns = total_returns + NEW.quantity;
      WHERE id = NEW.order_item_id;
   END IF;

   IF TG_OP = 'DELETE' OR TG_OP = 'UPDATE' THEN
      UPDATE order_item
      SET total_returns = total_returns - OLD.quantity;
      WHERE id = OLD.order_item_id;
   END IF;

   RETURN NULL;
END;$$;

CREATE TRIGGER requrn_order_trig
   AFTER INSERT OR UPDATE OR DELETE ON return_item
   FOR EACH ROW EXECUTE PROCEDURE requrn_order_trig();

UPDATE order_item AS oi
SET total_returns = (SELECT sum(quantity)
                     FROM return_item AS r
                     WHERE r.order_item_id = oi.id);

COMMIT;
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • This is a great answer and certainly works. I'm wondering, though, is it possible to do this without the denormalization / autoupdating column? Simply be running a query via a trigger at insertion time for the new return_item and rejecting the insertion if the total quantity would exceed the allowed quantity? If that's possible that would be cleanest IMO. I've never actually written triggers before so i'm unsure what's possible – mistahenry Dec 23 '20 at 17:05
  • That is possible, but to avoid race conditions you need `SERIALIZABLE` or pessimistic locking, which is also not great for performance. – Laurenz Albe Jan 07 '21 at 07:07
0

I would not recommend storing this derived information, because it is tedious to maintain.

Instead, you can implement the logic in the DML query. Consider the following query to register a new returned item:

insert into return_item (order_item_id, quantity)
select v.*
from (values ($1, $2)) as v(order_item_id, quantity)
inner join order_items oi on oi.id = v.order_item_id
where oi.quantity >= v.quantity + (
    select coalesce(sum(ri.quantity), 0) from return_item ri where ri.order_item_id = v.order_item_id
)

Input values are given as $1 and $2. The query brings the corresponding row in order_items, and checks if the overall returned quantity for that item is greater than the ordered amount.

The whole logic is implemented in a single query, so there is no risk of race condition if you have multiple concurrent processes.

From your application, you can check if the query affected any row. If it didn't, then you know the return was rejected.

You can put the query in a stored procedure if are going to use that on a regular basis.

GMB
  • 216,147
  • 25
  • 84
  • 135