10

I am developing a system which needs to allow users to be placed into groups. These groups can be freely created, edited, and deleted by other privileged users in the system. That part is easy; just create a group_users table that links users into groups. (If you're a stickler for normalization then you may create a group table that just lists off groups and then have a group_users table that links them together -- that's fine too)

Here's where it gets tricky. The client wants groups to also contain groups, to arbitrary depth and with arbitrary overlap (groups can be in multiple groups, and groups can contain multiple groups). That's easy enough to store (with a group_groups table), but it's difficult to query against without some sort extension like Oracle's CONNECT BY.

This recursive hierarchy also needs to be retroactive -- meaning that if group A contains group B, and group B is modified, then group A will be modified too -- so I can't cheat and just flatten out the structure. If you don't believe me that it can't simply be flattened, consider this situation. You have a group called "cool people" which contains users 1 and 2. Someone creates a group called "REALLY cool people" which contains user 3 and contains the group "cool people". When I query against "REALLY cool people" I should conclude that users 1, 2, and 3 are in the group. Now say that someone decides that user 2 isn't a cool person anymore and removes user 2 from "cool people". After that point in time, "REALLY cool people" only contains users 1 and 3. If I had originally flattened out the structure, I wouldn't know to remove user 2 from "REALLY cool people" when I removed him from "cool people".

So a trivial flattening won't work in this scenario. Other options I've considered:

  • Doing the recursion in code.
    • Too slow for complex groups, and also requires you to then do related joins in memory rather than on the database
  • Flatten the structure out into group_users_flattened, but also maintain a group_groups table. Create a trigger for group_users_flattened on INSERT/UPDATE/DELETE that will go to the group_groups table, find all groups that contain this group, and dynamically make the appropriate changes to group_users_flattened.
    • I can imagine this working, but it seems convoluted and error-prone, and I have a feeling there's a gotcha that I'm not seeing.

Are there other ideas that I haven't considered?

ean5533
  • 8,884
  • 3
  • 40
  • 64
  • 1
    Can a group be in multiple other groups? i.e can `cool people` be in `REALLY cool people` and `slightly cool people`? – Brendan Bullen Jul 29 '11 at 14:19
  • Yes, I forgot to mention that. Arbitrary depth AND arbitrary overlap. I'll add that to the question. – ean5533 Jul 29 '11 at 14:23
  • 1
    That's a tricky one. And, to complicate things further, can it be circular? i.e. Could it be that `cool people` is part of `slightly cool people`, `slightly cool people` is part of `REALLY cool people` and `REALLY cool people` is part of `cool people`? – Brendan Bullen Jul 29 '11 at 14:29
  • Hm, that's a problem I hadn't even considered yet. In theory... yes, I suppose the client may very well want to allow circular relationships and just have the code be smart enough to stop looking when it detects that it has looped back. This gets worse the more I look at it. – ean5533 Jul 29 '11 at 14:32
  • Based on the complexity, I'd say you're going to have to use the structure you've proposed (a `groups_groups` table) and handle the rules and behaviour in code... – Brendan Bullen Jul 29 '11 at 14:35
  • Are you stuck with MySQL? PostgreSQL and Firebird are both free as well and offer recursive common table expressions which are essentially the same as Oracle's CONNECT BY. –  Jul 29 '11 at 17:34
  • @a_horse_with_no_name I'm stuck on a 3rd party web host that offers either MySQL or SQL Server. Switching to PostgreSQL was the first thing that came to mind but it's not an option. – ean5533 Jul 29 '11 at 18:04
  • @ean5533: SQL Server supports recursive common table expressions as well. –  Jul 29 '11 at 18:31

7 Answers7

2

See my answer to What is the most efficient/elegant way to parse a flat table into a tree?. I describe a design I call Closure Table.

In your case, you'd have your tables Users and Groups and UserGroupMembers that is the intersection table (many-to-many) between Users and Groups.

Then you'd need another table to describe how groups are nested. Call it SubgroupPaths for instance. This records every path relating a given group to its subgroups.

CREATE TABLE SubgroupPaths (
  supergroup INT UNSIGNED NOT NULL,
  subgroup   INT UNSIGNED NOT NULL,
  pathlength SMALLINT UNSIGNED NOT NULL DEFAULT 0,
  PRIMARY KEY (supergroup, subgroup),
  FOREIGN KEY (supergroup) REFERENCES Groups(groupid),
  FOREIGN KEY (subgroup) REFERENCES Groups(groupid)
);

You may also need some permutations of compound indexes to support certain queries you'd run against this table.

This design allows you to have multiple hierarchies, so you could have group "cool people" be a descendant of each of its supergroups.

INSERT INTO Groups (groupid, groupname) VALUES
(1, 'REALLY cool people'),
(2, 'slightly cool people'),
(3, 'cool people');

INSERT INTO SubgroupPaths (supergroup, subgroup, pathlength) VALUES
(1,1,0), (2,2,0), (3,3,0), -- every group points to itself
(1,3,1), -- REALLY is parent of cool people
(2,3,1); -- slightly is also parent of cool people

Now you can get the list of all users who should be considered cool people, regardless of whether they are members of cool people, slightly cool people, or REALLY cool people. We can even throw in a DISTINCT in case some users are associated with more than one of these groups.

SELECT DISTINCT u.*
FROM SubgroupPaths AS cool
JOIN SubgroupPaths AS supercool ON cool.subgroup=supercool.subgroup
JOIN Groups AS g ON supercool.supergroup = g.groupid
JOIN UserGroupMembers AS m ON m.groupid = g.groupid
JOIN Users AS u ON u.userid = m.userid
WHERE cool.subgroup = 3;

I prefer Closure Table over the Nested Sets design suggested by other answers, because Closure Table supports referential integrity constraints, and there are some queries that are hard in Nested Sets but easier in Closure Table.

For more on Closure Table, check out my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.


Footnote to all this: be careful about violating the YAGNI principle.

I once implemented a database to store arbitrary-depth groups like this, and all the PHP code to display, report, and administer hierarchies. Also I had to clone hierarchical collections when they were used, because the hierarchy could be reorganized later, and we needed to keep historical data of how the elements in the hierarchy were used. It took weeks to code and test. And when it was all done, the user of the application never actually stored any hierarchy more one one level deep.

Some decision-makers will change their minds about the scope of requirements if you tell them how much work (i.e. budget) it will take to implement and test.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • This is the most usable solution that has been posted. The only problem left is keeping `SubgroupPaths` updated. Because the groups aren't static, there will have to be a stored proc that rebuilds `SubgroupPaths` whenever a user adds/removes a group-group nesting. However, that's certainly easier to do than trying to maintain a flattened table of group users. I think this is the best possible answer, though like you suggested I think that YAGNI is going to be the final conclusion when I give my quote to the client. – ean5533 Aug 05 '11 at 17:20
  • @Bill Karwin: I fail to see how this solves the problem of nested groups. If I can put a child group in a parent group, i can also put an intermediary group in between parent and child, and then link from intermediary to child (in addition to parent to child). Then your methods fails with a primary-key violation... – Stefan Steiger Dec 12 '16 at 10:58
  • @StefanSteiger, yes, of course you have to update some paths if you insert or delete new nodes or move them around. But the point is that it's a lot easier to query for trees, assuming the paths are correct. The advantage over Nested Sets is that the Closure Table design can enforce references using foreign key constraints, but Nested Sets cannot. – Bill Karwin Dec 12 '16 at 16:08
1

"Queries using nested sets can be expected to be faster than queries using a stored procedure to traverse an adjacency list, and so are the faster option for databases which lack native recursive query constructs, such as MySQL."

http://en.wikipedia.org/wiki/Nested_set_model

https://docs.joomla.org/Using_nested_sets

Drawback However inserting a new node (row) will require all rows to be updated accordingly

Harry Bosh
  • 3,611
  • 2
  • 36
  • 34
0

I'd use nested sets. Full details here:

http://www.alandelevie.com/2008/07/12/recursion-less-storage-of-hierarchical-data-in-a-relational-database/

Although I've never used that to represent overlapping.

Deleted
  • 4,804
  • 1
  • 22
  • 17
  • I'm not sure any of the 3 methods suggested in the article can work in this case (as you've highlighted, the problem is the overlapping grouping) – Brendan Bullen Jul 29 '11 at 14:33
  • Ok in that case, I'd simply do the whole thing in code and keep parent-child relationships in the DB. – Deleted Jul 29 '11 at 15:02
0

Could you have a table of users_groups (with a column for each row to distinguish between entries for users and entries for groups) and a separate multi-multi junction table listing all of the user_group_memberships?

I'm guessing that the junction table would need a constraint to ensure that the groups column is both an FK into the first table and that the type is of group. (In other words if the junction table has two columns: member_ID and group_ID then member_ID can be a reference to either a member or a group while group_ID must refer only to a group.

That would allow any user or group to be including in the membership of any group while preventing any user or group from being a "member" of a user.

(BTW: I'm not sufficiently well-versed in MySQL to cook up a working example right now; but I'd like to see one if this suggestion is feasible).

Jim Dennis
  • 17,054
  • 13
  • 68
  • 116
  • Forgive me if I'm missing something obvious, but it doesn't look like your solution gives me something easily joinable, which is what I really need. I need to be able to join against X using group_id as the joining attribute so that I can get related information to the users in that group. Forgive me if I'm misunderstanding your solution. – ean5533 Jul 29 '11 at 14:52
0

What about a structure such as

enter image description here

A relationship such as really cool people being related to cool people as 'chained' (and thus the appropriate cascade), and vice-versa for free.

Nick
  • 2,872
  • 3
  • 34
  • 43
0

Have you considered a self-referential structure in your groups table? Say you put in a column called "superclass." Just like OOP, subclasses inherit from superclasses. Then give it an ID column, so that you have:

[ ID | Group name | Whatever other columns | superclass ]

And a foreign key constraint between ID and superclass.

This way, say you have group heffalump, ID = 3. Its superclass could be 1, where ID of 1 corresponds to group name winniethepooh.

Say Woozle has ID of 4. It can also have superclass 1. So it would still be under winniethepooh.

Fairly simple, but it should work without much trouble. This way, as per your example, "really cool people" would be hierarchical under "cool people," so that the only people in "really cool people" who are NOT in "cool people" would be those that aren't already in "cool people" to begin with. So if you take a person out of "cool people" he wouldn't be under "really cool people," but if you take a person out of "really cool people" it won't affect "cool people."

Sorry about the complicated explanation, and I hope this helps!

*edit: I notice this is essentially the first example given in the other link. In that case, I'm all out of other ideas. Sorry!

Harvest Zhang
  • 23
  • 1
  • 4
  • This doesn't allow for groups with multiple parents. ("silly people" can be a child of "people" and of "silly things") More importantly, it still also requires a recursive JOIN which can't be done in MySQL. – ean5533 Aug 05 '11 at 17:12
0

I would look into using a Common Table Expression (CTE) for recursion. In my experience, it is the most efficient way to query hierarchical data in SQL Server.

Here is a link which explains how to use CTEs: http://msdn.microsoft.com/en-us/library/ms190766.aspx

And here is a simple example of how to use a CTE to query a hierarchy. You will obviously have to adjust the code for your application, but this should point you in the right direction.

WITH Groups AS
(
   --initialization
   SELECT ParentGroups.GroupID, ParentGroups.GroupName, ParentGroups.ParentGroupID
   FROM ParentGroups
   WHERE ParentGroups.ParentGroupID IS NULL
   UNION ALL
   --recursive execution
   SELECT SubGroups.GroupID, SubGroups.GroupName, SubGroups.ParentGroupID
   FROM Groups SubGroups INNER JOIN Groups ParentGroups 
   ON SubGroups.ParentGroupID = ParentGroups.GroupID
)
SELECT * FROM Groups

Also, you shouldn't need to have a group_groups table. You can hold the entire hierarchy in the group table by adding a ParentGroupID column.

James Johnson
  • 45,496
  • 8
  • 73
  • 110
  • Will this work for `MySQL` which is the requirement in this case? Also, the groups can be associated with multiple groups so a `ParentGroupID` column wouldn't work. – Brendan Bullen Aug 05 '11 at 15:46
  • Sorry, I must have missed that. I'm not sure if mySQL supports CTEs. There is a thread discussing alternatives to CTEs in mySQL, though: http://stackoverflow.com/questions/1382573/how-do-you-use-the-with-clause-in-mysql – James Johnson Aug 05 '11 at 15:53
  • At the moment I'm stuck on MySQL. In theory we could migrate to SQL Server, but it would be a lot of work just to solve this single issue. Also, as Brendan already mentioned, a `ParentGroupID` wouldn't work. A group can have multiple parent groups as well as multiple child groups. – ean5533 Aug 05 '11 at 17:08