Optimize plpgsql function

Question

PostgreSQL version is 9.0.

I have to optimize a plpgsql function. The idea is just to run though all documents and test if related rows 902,903,905,907 in table webdte.doc_tip_cifra already exist. If they not already exist, insert null rows to satisfy the validation afterwards. It is ridiculously slow at the moment even if I only use one of the 4 conditions and use half of the amount of rows it has to be run against. Anybody has an idea for a performance boost?

CREATE OR REPLACE FUNCTION webdte.addtagobligatoriosventa(idlibro bigint)
  RETURNS character AS
$BODY$
DECLARE                 
    id_documento bigint;
    validador integer;
    validador1 integer;
    validador2 integer;
    validador3 integer;
    validador4 integer;

    tipo_cifra integer;
    --counts integer[];
BEGIN
    SELECT INTO validador1, validador2, validador3, validador4
             max(CASE id_tipo_cifra WHEN 901 THEN 1 ELSE 0 END)
            ,max(CASE id_tipo_cifra WHEN 902 THEN 1 ELSE 0 END)
            ,max(CASE id_tipo_cifra WHEN 905 THEN 1 ELSE 0 END)
            ,max(CASE id_tipo_cifra WHEN 907 THEN 1 ELSE 0 END)
    FROM   webdte.doc_tip_cifra
    WHERE  id_doc = id_documento;

    if (validador1 = 0) then
        insert into webdte.doc_tip_cifra (id_doc, id_tipo_cifra, tasa_imp, val_imp)
        values (id_documento, 901, 0, 0);

    end if;
        if (validador2 = 0) then
        insert into webdte.doc_tip_cifra (id_doc, id_tipo_cifra, tasa_imp, val_imp)
        values (id_documento, 902, 0, 0);

    end if;
        if (validador3 = 0) then
        insert into webdte.doc_tip_cifra (id_doc, id_tipo_cifra, tasa_imp, val_imp)
        values (id_documento, 905, 0, 0);

    end if;
        if (validador4 = 0) then
        insert into webdte.doc_tip_cifra (id_doc, id_tipo_cifra, tasa_imp, val_imp)
        values (id_documento, 907, 0, 0);

    end if;

END;
$BODY$
  LANGUAGE plpgsql VOLATILE;

Maybe it is better to decide for an insert trigger, that inserts 4 null rows on doc_tip_cifra on every document insert to avoid this stupid expensive loop over all documents and test 4 times for each document? What do you think?

score 2 · Answer 1 · edited May 23 '17 at 12:21

Turns out, you don't actually need the count. Your preceding question has conveyed that impression. However, it doesn't get you very far to just substitute sum with max in my solution.

It works, yes, but it is insanely inefficient. You don't have to run through the rest of the table, after you have found a matching row. That's what EXISTS semi-joins are for. I propose this completely different approach:

INSERT INTO webdte.doc_tip_cifra (id_doc, id_tipo_cifra, tasa_imp, val_imp)
SELECT id_documento, 901, 0, 0
WHERE  NOT EXISTS (
    SELECT 1
    FROM   webdte.doc_tip_cifra
    WHERE  id_doc = id_documento
    AND    id_tipo_cifra = 901
    );

INSERT INTO webdte.doc_tip_cifra (id_doc, id_tipo_cifra, tasa_imp, val_imp)
SELECT id_documento, 902, 0, 0
WHERE  NOT EXISTS (
    SELECT 1
    FROM   webdte.doc_tip_cifra
    WHERE  id_doc = id_documento
    AND    id_tipo_cifra = 902
    );

INSERT INTO webdte.doc_tip_cifra (id_doc, id_tipo_cifra, tasa_imp, val_imp)
SELECT id_documento, 905, 0, 0
WHERE  NOT EXISTS (
    SELECT 1
    FROM   webdte.doc_tip_cifra
    WHERE  id_doc = id_documento
    AND    id_tipo_cifra = 905
    );

INSERT INTO webdte.doc_tip_cifra (id_doc, id_tipo_cifra, tasa_imp, val_imp)
SELECT id_documento, 907, 0, 0
WHERE  NOT EXISTS (
    SELECT 1
    FROM   webdte.doc_tip_cifra
    WHERE  id_doc = id_documento
    AND    id_tipo_cifra = 907
    );

You can wrap this in a plpgsql or sql function or you can just run it as plain SQL.

Other than with your preceding question this can most likely make use of suitable indexes. Optimum would be a multi-column index like:

CREATE INDEX doc_tip_cifra_special_idx
ON webdte.doc_tip_cifra (id_doc, id_tipo_cifra);

Should make your queries lightening-fast.

Also, this kind of algorithm has an inherent problem with concurrency. The time window between checking if a row already exists and inserting it should be as small as possible. Putting it all in one query is the optimum in this regard.

Still, it's not perfect. If your database sees a lot of concurrency, you might be interested in this excellent blog post by @depesz or read more under this related question.

And yes, solving that with a trigger sounds like a good idea. I would do it.

thank you so much! I am just learning postgres and you really opened me the eyes with this one. cool.. — mugdiman, Jul 24 '12 at 04:05

Optimize plpgsql function

1 Answers1