2

I have the following tables:

CREATE TABLE element (
  element_id serial PRIMARY KEY,
  local_id integer,
  name varchar,
  CONSTRAINT fk_element_local_id FOREIGN KEY (local_id)
      REFERENCES local (local_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE local (
  local_id serial PRIMARY KEY,
  parent_id integer,
  name varchar,
  CONSTRAINT fk_local_parent_id_local_id FOREIGN KEY (parent_id)
      REFERENCES local (local_id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE SET NULL
);

CREATE TABLE category (
  category_id serial PRIMARY KEY,
  name varchar
);

CREATE TABLE action (
  action_id serial PRIMARY KEY,
  local_id integer,
  category_id integer,
  CONSTRAINT fk_action_local_id FOREIGN KEY (local_id)
      REFERENCES local (local_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_action_element_id FOREIGN KEY (element_id)
      REFERENCES element (element_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
);

I want to select all elements from an action. If the local from the element is descendant from the local of the action it should appear as well.
example:

Table local:

|local_id | parent_id | name |
|---------+-----------+------|
|1        |NULL       |A     |
|2        |1          |B     |
|3        |1          |C     |
|4        |3          |D     |
|5        |NULL       |E     |
|6        |5          |F     |
|_________|___________|______|

Table category:

| category_id | name |
|-------------+------|
|1            |A     |
|2            |B     |
|2            |C     |
|_____________|______|

Table element:

|element_id | local_id | name | category_id |
|-----------+----------+------+-------------|
|1          |1         |A     | 1           |
|2          |2         |B     | 2           |
|3          |2         |C     | 1           |
|4          |4         |D     | 2           |
|5          |5         |E     | 2           |
|6          |6         |F     | 1           |
|7          |6         |G     | 1           |
|___________|__________|______|_____________|

Table action:

|action_id | local_id | category_id |
|----------+----------+-------------|
| 1        | 1        | 2           |
| 2        | 3        | 1           |
| 3        | 5        | 1           |
| 4        | 6        | 1           |
|__________|__________|_____________|

The results from query that I want:

CASE: action_id = 1
return: element_id: 2,4

CASE: action_id = 2
return: element_id: null

CASE: action_id = 3
return: element_id: 6,7

I've made a function that returns all the descendants including the actual node but I'm having a hard time because of the performance when calling the function thousands of times. My function looks like this:

CREATE OR REPLACE FUNCTION fn_local_get_childs(_parent_id integer)
  RETURNS SETOF integer AS
$BODY$
DECLARE
   r integer;
BEGIN
   FOR r IN SELECT local_id FROM local WHERE local_id IN ( 
      (WITH RECURSIVE parent AS
      (
         SELECT local_id , parent_id  from local WHERE local_id = _parent_id
         UNION ALL 
         SELECT t.local_id , t.parent_id FROM parent
         INNER JOIN local t ON parent.local_id =  t.parent_id
      )
      SELECT local_id FROM  parent
      ) 
   )
   LOOP
      RETURN NEXT r;
   END LOOP;
   RETURN;        
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100
  ROWS 1000;

And my ultra slow query looks like these:

select e.element_id, a.action_id
from action a
join element e on (
                   e.local_id=any(select fn_local_get_childs(a.local_id)) AND 
                   e.category_id=a.category_id)

Is there a way of combining the recursion used in the function in a single query?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
NewK
  • 343
  • 3
  • 21
  • 1
    Since your PL/PgSQL function is a simple wrapper around a single query, replace it with an `LANGUAGE sql` function that's declared `STABLE` (since it appears to be) but *not* `STRICT`. The query planner may then inline the query. Examine the `explain analyze` output (paste it here, preserving formatting, or paste to explain.depesz.com and put the link here) and see what happens. – Craig Ringer May 25 '13 at 03:37
  • 1
    1) why use a function ? 2) why in plpgsql when it could be done in plain SQL ? 3) replace the `IN(...)` subquery by an `EXISTS(...)` subquery (or a plain JOIN) – wildplasser May 25 '13 at 16:08
  • There are ways to integrate the recursion in your query. The fastest way depends on the details. Cardinality of the tables? Depth of the recursion. Distribution of categories? Do you need additional columns in the result? Do you want all rows like the example suggests or are there additional criteria to select only few? – Erwin Brandstetter May 25 '13 at 20:57

1 Answers1

1

Integrate query

Improving the logic in several places, you can integrate the whole operation in a single query. Wrapping into an SQL function is optional:

CREATE OR REPLACE FUNCTION f_elems(_action_id integer)
  RETURNS SETOF integer AS
$func$
   WITH RECURSIVE l AS (
      SELECT a.category_id, l.local_id
      FROM   action a
      JOIN   local  l USING (local_id)
      WHERE  a.action_id = $1

      UNION ALL 
      SELECT l.category_id, c.local_id
      FROM   l
      JOIN   local c ON c.parent_id = l.local_id  -- c for "child"
      )
   SELECT e.element_id
   FROM   l
   JOIN   element e USING (category_id, local_id);
$func$  LANGUAGE sql STABLE;

Retrieves all element_id for same and child-locals of a given action_id.

Call:

SELECT * FROM f_elem(3);

element_id
-----------
6
7

db<>fiddle here
OLD sqlfiddle

This should be substantially faster already for several reasons. The most obvious ones being:

  • Substitute pure SQL for slow looping in plpgsql.
  • Narrow down starting set of the recursive query.
  • Remove unnecessary and notoriously slow IN construct.

I am calling with SELECT * FROM ... instead of just SELECT, even though the row has only a single column, to get the column name of the OUT parameter (element_id) I declared in the function header.

Faster, yet

Indices

An index on action.action_id is provided by the primary key.

But you may have missed the index on local.parent_id. While being at it, make that a covering multi-column index (Postgres 9.2+) with parent_id as first element and local_id as second. This should help a lot if the table local is big. Not so much or not at all for a small table:

CREATE INDEX l_mult_idx ON local(parent_id, local_id);

Why? See:

Finally, a multi-column index on table element should help some more:

CREATE INDEX e_mult_idx ON element (category_id, local_id, element_id);

The third column element_id is only useful to make it a covering index. If your query retrieves more columns from table element, you may want to add more columns to the index or drop element_id. Either will make it faster.

Materialized view

If your tables receive few or no updates, a materialized view providing the pre-computed set of all pairs (action_id, element_id) sharing the same category would make this lightening-fast. Make (action_id, element_id) (in that order) the primary key.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228