PGSQL Tree relations
Hello, I just got a handle on this for a project I'm working on and figured I'd share my write-up
Hope this helps. Let's get started with some prereqs
This is essentially the closure table
solution mentioned above Using recursive calls. Thanks for those slides they are very useful I wish i saw them before this write up :)
pre-requisites
Recursive Functions
these are functions that call themselves ie
function factorial(n) {
if (n = 0) return 1; //base case
return n * factorial(n - 1); // recursive call
}
This is pretty cool luckily pgsql has recursive functions too but it can be a bit much. I prefer functional stuff
cte with pgsql
WITH RECURSIVE t(n) AS (
VALUES (1) -- nonrecusive term
UNION ALL
SELECT n+1 FROM t WHERE n < 100 -- recusive term
--continues until union adds nothing
)
SELECT sum(n) FROM t;
The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:
Recursive Query Evaluation
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
a. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
b. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
to do something like factorial in sql you need to do something more like this so post
ALTER FUNCTION dbo.fnGetFactorial (@num int)
RETURNS INT
AS
BEGIN
DECLARE @n int
IF @num <= 1 SET @n = 1
ELSE SET @n = @num * dbo.fnGetFactorial(@num - 1)
RETURN @n
END
GO
Tree data structures (more of a forest :)
wikipedia
The import thing to note is that a tree is a subset of a graph, This can be simply enforced by
the relationship each node has only one parent.
Representing the Tree in PGSQL
I think it will be easiest to work it out a little more theoretically before we move on to the sql
The simple way of represent a graph relation without data duplication is by separating the nodes(id, data)
from the edges.
We can then restrict the edges(parent_id, child_id)
table to enforce our constraint. be mandating that parent_id,child_id
as well as just child id be unique
create table nodes (
id uuid default uuid_generate_v4() not null unique ,
name varchar(255) not null,
json json default '{}'::json not null,
remarks varchar(255),
);
create table edges (
id uuid default uuid_generate_v4() not null,
parent_id uuid not null,
child_id uuid not null,
meta json default '{}'::json,
constraint group_group_id_key
primary key (id),
constraint group_group_unique_combo
unique (parent_id, child_id),
constraint group_group_unique_child
unique (child_id),
foreign key (parent_id) references nodes
on update cascade on delete cascade,
foreign key (child_id) references nodes
on update cascade on delete cascade
);
Note that theoretical this can all be done with only one table by simply putting the parent_id in the nodes table
and then
CREATE VIEW v_edges as (SELECT id as child_id, parent_id FROM nodes)
but for the proposal of flexibility and so that we can incorporate other graph structures to this
framework I will use the common many-to-many relationship structure. This will ideally allow this research to be
expanded into other graph algorithms.
Let's start out with a sample data structure
INSERT (id, my_data) VALUES ('alpha', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('bravo', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('charly', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('berry', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('zeta', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('yank', 'my big data') INTO nodes
INSERT (parent_id, child_id) VALUES ('alpha', 'bravo') INTO edges
INSERT (parent_id, child_id) VALUES ('alpha', 'berry') INTO edges
INSERT (parent_id, child_id) VALUES ('bravo', 'charly') INTO edges
INSERT (parent_id, child_id) VALUES ('yank', 'zeta') INTO edges
-- rank0 Alpha Yank
-- rank1 Bravo Berry Zeta
-- rank2 Charly
Note the interesting properties of a tree (number of edges e) =( number of nodes n)-1
each child has exactly one parent.
We can then simplify the equations
let n = node
let p = parent
let c = child
let ns = nodes = groups
let es = edges = group_group // because this is a relationship of a group entity to another group entity
So now what sort of questions will we ask.
"Given an arbitrary set of groups 's' what is the coverage of the graph assuming nodes inherit their children?"
This is a tricky question, it requires us to traverse the graph and find all children of each node in s
This continues off of this stack overflow post
-- some DBMS (e.g. Postgres) require the word "recursive"
-- some others (Oracle, SQL-Server) require omitting the "recursive"
-- and some (e.g. SQLite) don't bother, i.e. they accept both
-- drop view v_group_descendant;
create view v_group_descendant as
with recursive descendants -- name for accumulating table
(parent_id, descendant_id, lvl) -- output columns
as
( select parent_id, child_id, 1
from group_group -- starting point, we start with each base group
union all
select d.parent_id, s.child_id, d.lvl + 1
from descendants d -- get the n-1 th level of descendants/ children
join group_group s -- and join it to find the nth level
on d.descendant_id = s.parent_id -- the trick is that the output of this query becomes the input
-- Im not sure when it stops but probably when there is no change
)
select * from descendants;
comment on view v_group_descendant is 'This aggregates the children of each group RECURSIVELY WOO ALL THE WAY DOWN THE TREE :)';
after we have this view we can join with our nodes/groups to get out data back i will not provide these samples for every single step for the most part we will just work with ids.
select d.*, g1.group_name as parent, g2.group_name as decendent --then we join it with groups to add names
from v_group_descendant d, groups g1, groups g2
WHERE g1.id = d.parent_id and g2.id = d.descendant_id
order by parent_id, lvl, descendant_id;
sample output
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
+------------------------------------+------------------------------------+---+----------+---------+
Note that this is just the minimal node descendant relationship and has actual lost all nodes with 0 children such as charly.
In order to resolve this we need to add all nodes back which don't appear in the descendants list
create view v_group_descendant_all as (
select * from v_group_descendant gd
UNION ALL
select null::uuid as parent_id,id as descendant_id, 0 as lvl from groups g
where not exists (select * from v_group_descendant gd where gd.descendant_id = g.id )
);
comment on view v_group_descendant is 'complete list of descendants including rank 0 root nodes descendant - parent relationship is duplicated for all levels / ranks';
preview
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
|null |c1529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |alpha |
|null |42529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |yank |
+------------------------------------+------------------------------------+---+----------+---------+
Lets say for example we are getting our set s of groups bases on a users(id , data)
table with a user_group(user_id, group_id)
relation
We can then join this to another table removing duplicates because our set s
of user_group relations may cause
duplicates if a users is say assigned to both alpha assigned charly
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | charly |
| kier | yank |
| kier | bravo |
+------+--------+
--drop view v_user_group_recursive;
CREATE VIEW v_user_group_recursive AS (
SELECT DISTINCT dd.descendant_id AS group_id, ug.user_id
FROM v_group_descendant_all dd , user_group ug
WHERE (ug.group_id = dd.descendant_id
OR ug.group_id = dd.parent_id) -- should gic
);
SELECT * FROM v_user_group_recursive;
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | bravo |
| jane | berry |
| jane | charly |
-- | jane | charly | Removed by DISTINCT
| kier | yank |
| kier | zeta |
| kier | bravo |
| kier | charly |
+------+--------+
If we want we can now group by node and join we can do somthing k like the fallowing
CREATE VIEW v_user_groups_recursive AS (
SELECT user_id, json_agg(json_build_object('id', id,'parent_id',parent_id, 'group_name', group_name, 'org_id', org_id, 'json', json, 'remarks', remarks)) as groups
FROM v_user_group_recursive ug, v_groups_parent g
WHERE ug.group_id = g.id GROUP BY user_id
);
comment on view v_user_group_recursive is 'This aggregates the groups for each user recursively ';
+------+-------------------------------+
| user | groups |
+------+-------------------------------+
| jane | [alpha, bravo, berry, charly] |
| kier | [yank, zeta, bravo, charly] |
+------+-------------------------------+
This is awesome we have answered the question. We now can simply ask which groups this use inherits
SELECT * from v_user_groups_recursive where user_id = 'kier
Displaying our hard work in the front end
And further we could use somthing like jstree.com to display
our structure
async function getProjectTree(user_id) {
let res = await table.query(format('SELECT * from v_user_groups_recursive ug WHERE ug.user_id = %L', user_id));
if (res.success) {
let rows = res.data[0].groups.map(r => {
return {
id: r.id, // required
parent: r.parent_id==null?'#':r.parent_id,// required
text: r.group_name,// node text
icon: 'P', // string for custom
state: {
opened: true, // is the node open
disabled: false, // is the node disabled
selected: false, // is the node selected
},
li_attr: {}, // attributes for the generated LI node
a_attr: {} // attributes for the generated A node
}
})
return {success: true, data: rows, msg: 'Got all projects'}
} else return res;
}
<div id="v_project_tree" class="row col-10 mx-auto" style="height: 25vh"></div>
<script>
function buildTree() {
bs.sendJson('get', "/api/projects/getProjectTree").then(res => {
bs.resNotify(res);
if (!res.success) {
//:(
console.error(':(');
return
}
console.log(res.data);
$('#v_project_tree').jstree({
'core': {
'data': res.data
}
});
})
}
window.addEventListener('load', buildTree);
</script>
jstree preview
blog