6

I know the answer would seem to be to use "WITH RECURSIVE" as per this post but I'm just not getting it.

I have a table called people, and a table called position_hierarchy. The people table has a unique id uperson_id and position id we call pcn and an enabled flag (because when somebody leaves and is replaced, their replacement gets the same pcn). The position_hierarchy has the column pcn, and another column reports_to which is the pcn of the person above them in the hierarchy. What I want to do is give a person's uperson_id and find all the uperson_ids of the people above them in the hierarchy, and/or give a uperson_id and another person's uperson_id and tell if the second person somebody who has a supervisory position over the first.

The president of the company is indicated because their pcn is the same as their reports_to. (Not my decision - I would have used a null reports_to)

What I came up with so far is:

with recursive parents (uperson_id, pcn, reports_to) as
(
 select p1.uperson_id, ph1.pcn, ph1.reports_to
 from people p1
 join position_hierarchy ph1 on ph1.pcn = p1.pcn
 where reports_to != ph1.pcn and active_revoke_flag = '0'

 union all

 select p2.uperson_id, ph2.pcn, ph2.reports_to
 from people p2
 join position_hierarchy ph2 on p2.pcn = ph2.pcn
 join parents pp on pp.pcn = ph2.reports_to
)
select parents.* from parents where uperson_id = 'aaa3644';

but that returns 5 rows with the same uperson_id, pcn and reports_to (which seems like the right number of rows, but I want the supervisor's uperson_id at each level. I feel like I'm missing something very basic, and I'll probably slap my head when you tell me what I'm doing wrong.

What I did

Based on Erwin Brandstetter's answer, I fixed a few things (mostly because I didn't make clear which table the active_revoke_flag was in) and came up with:

with recursive p as (
    select pcn, reports_to
    from   position_hierarchy
    where  pcn = (SELECT pcn FROM people WHERE uperson_id = 'aaa3644')
    union all
    select ph2.pcn, ph2.reports_to
    from   p
    join   position_hierarchy ph2 ON ph2.pcn = p.reports_to AND
           p.pcn != p.reports_to
)
select p2.uperson_id, p2.active_revoke_flag, p.*
from   p
join   people p2 USING (pcn)
where  p2.active_revoke_flag = '0';
Community
  • 1
  • 1
Paul Tomblin
  • 179,021
  • 58
  • 319
  • 408

2 Answers2

4

I would try this bottom up approach, start with the person of interest and work my way up:

with recursive p as (
    select p1.uperson_id, p1.pcn, ph1.reports_to
    from   people p1
    join   position_hierarchy ph1 USING (pcn)
    where  ph1.active_revoke_flag = '0'
    and    p1.uperson_id = 'aaa3644'

    union all

    select p2.uperson_id, p2.pcn, ph2.reports_to
    from   p
    join   position_hierarchy ph2 ON ph2.pcn = p.reports_to
                                 AND ph2.active_revoke_flag = '0'
    join   people p2 ON p2.pcn = ph2.pcn
)
select * from p;

Or, faster, because we only join to person once:

with recursive p as (
    select pcn, reports_to
    from   position_hierarchy
    where  active_revoke_flag = '0'
    and    pcn = (SELECT pcn FROM person WHERE uperson_id = 'aaa3644')

    union all

    select ph2.pcn, ph2.reports_to
    from   p
    join   position_hierarchy ph2 ON ph2.pcn = p.reports_to
                                 AND ph2.active_revoke_flag = '0'
)
select p2.uperson_id, p.*
from   p
join   people p2 USING (pcn); -- assuming pcn is unique in table person

As an aside: I do find your design with duplicate pcn somewhat dubious.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
-2

If you know the maximum tree depth, you can kludge it into a single query; otherwise, you need to read Joe Chelko and restructure your tables.

A single query for a maximum depth of 5 might look like this (untested):

select distinct ph1.pcn
from position_hierarchy ph1
join position_hierarchy ph2 on ph1.pcn = ph2.reports_to
left join position_hierarchy ph3 on ph2.pcn = ph3.reports_to
left join position_hierarchy ph4 on p3.pcn = ph4.reports_to
left join position_hierarchy ph5 on ph4.pcn = ph5.reports_to
where ph2.pcn = @my_pcn
or ph3.pcn = @my_pcn
or ph4.pcn = @my_pcn
or ph5.pcn = @my_pcn;

The result will be a list of the pcn's of all of @my_pcn's superiors. You could also add something to test for the edge case of the company president.

Chelko wrote the book on hierarchies in SQL, literally: http://books.google.ca/books/about/Joe_Celko_s_Trees_and_Hierarchies_in_SQL.html?id=Jy4e8SbYO6sC

Canuck
  • 565
  • 4
  • 13
  • 1
    A recursive query is a perfect fit for this kind of problem. No need to do that mulit-join thing –  Sep 01 '12 at 22:16
  • 1
    I'm not planning to restructure the tables or assume a maximum depth, but thanks for the link to the book. – Paul Tomblin Sep 01 '12 at 22:26
  • 3
    I downvoted because you are factually wrong on the need to restructure tables. It is worth noting that different database systems have different approaches to this problem and your approach is probably only needed in a lowest-common-denominator db like MySQL. – Chris Travers Sep 02 '12 at 04:57
  • 1
    Celko's Trees and Hierarchies is a workaround for databases that don't support recursion. If you have recursion available, don't use a bad schema design. It's slow, not reliable and hard to maintain. – Frank Heikens Sep 02 '12 at 08:06
  • @FrankHeikens: I disagree about the slowness and the "bad schema design". What's wrong with "Closure" for example compared with hierarchies and recursive queries? – ypercubeᵀᴹ Sep 02 '12 at 08:19
  • 1
    @ypercube: Referential integrity is one of the problems, inserting and updating is another one: You have to update many records for inserting or updating a single record. – Frank Heikens Sep 02 '12 at 09:25
  • 1
    Yes, I agree, abit more complex inserts and updates. But the queries will be simpler (and in occasions that means faster). And cascading deletes (if you need them) are easier to deal with. – ypercubeᵀᴹ Sep 02 '12 at 09:35
  • Chris Travers (and others) - has PostgreSQL actually changed the basic math of relational data somehow, or does a recursive query just execute the same statement multiple times on the server side, the same way a custom-written stored procedure would? Normally, there are two problems with using recursive queries in any RDBMS (with or without syntactic sugar): (1) they can execute the same statement multiple times, and (2) their performance isn't predictable. I'm guessing that Paul's app isn't high-demand enough for either of those to matter, though. – Canuck Sep 02 '12 at 14:21
  • PostgreSQL chest-thumping aside, the other problem that Chelko's book solved is selecting a subtree: for example, instead of querying who a person reports to (a simple list of ancestors), what if Paul had to query everyone who reports to a person, directly or indirectly? There could be thousands of results in a big company, and you wouldn't want PostgreSQL to execute the query statement thousands of times internally. – Canuck Sep 02 '12 at 14:47