MSSQL Grouping by Nested Values

Question

I need to group by a series of nested relationships. Use the following example. Assuming each persons name is unique to the database.

Person | Sibling 1 | Sibling 2
-------+-----------+-----------
Jason  | Brad      | Sheri
Brad   | Sheri     | Jason
Sheri  | Brad      | Tina
Tina   | Sheri     | Sam
Sam    | Kara      | Tina
Kara   | Sam       | Tina
James  | Kelly     | NULL
Kelly  | James     | NULL
Fred   | NULL      | NULL

How would I write the query to get this result?

Person | Family
-------+--------
Jason  | 1
Brad   | 1
Sheri  | 1
Tina   | 1
Sam    | 1
Kara   | 1
James  | 2
Kelly  | 2
Fred   | 3

Ideally with out the use of external code such as CLR or CTEs.

EDIT: The following output is also acceptable.

Family | Siblings
-------+-------------------------------------
1      | Jason, Brad, Sheri, Tina, Sam, Kara
2      | James, Kelly
3      | Fred

You would help us (and yourself) a great deal if you explain the logic behing column `Family`. E.g. why does Fred get `Family`=`3`, even though he occurs just once in the Person table? — Peter B, Dec 19 '18 at 11:46
Because Fred does not have any siblings therefor he must be an only child of his own family. I need to build this family group by evaluating all the member of the family and then grouping them all together — BKSwindell, Dec 19 '18 at 11:47
I still don't understand, please try to explain it better. What does what you just said have to do with the value **`3`** ? — Peter B, Dec 19 '18 at 11:48
This is more complex than it seems and it's a problem of graphs. Please have a look at this answer https://stackoverflow.com/questions/35254260/how-to-find-all-connected-subgraphs-of-an-undirected-graph — EzLo, Dec 19 '18 at 11:49
I was hoping to do this with out any cursors or CTE's. Do you think its possible? — BKSwindell, Dec 19 '18 at 11:52
The family values are auto numbers by the group/partition. The actual values are irrelevant as long as they are unique per "family". — BKSwindell, Dec 19 '18 at 11:54
@BKSwindell I don't think it's possible to solve with a simple query, unless you use graph functionalities available on SQL Server 2017 (which requires specific table and data types). — EzLo, Dec 19 '18 at 11:56
We are working with SQL 2008 so that is not possible. Ok i will need to figure out how we would do it with the CTE. We are working with a dataset of 500k rows and need to make sure it operates very quickly. My experense with CTEs and Cursors are that they are very slow. — BKSwindell, Dec 19 '18 at 11:57
is `Tina` sibling for `Jason` ? Let us know this login of your desire output. — Pugal, Dec 19 '18 at 12:05
Yes Tina is related to Jason because she is related to Sheri who is related to Jason. — BKSwindell, Dec 19 '18 at 12:07
Essentially we need to build a list of all siblings that are related as @EzLo suggesst in his link. If i could build 3 CSV strings of "Jason, Brad, Sheri, Tina, Sam, Kara" and then "James, Kelly", and finally "Fred" that would be the first step. The rest would be easy. — BKSwindell, Dec 19 '18 at 12:12
Is there a known and guaranteed upper limit on family members? — iamdave, Dec 19 '18 at 12:12
Im not sure yet as i have not been able to evaluate the data to that level. But I'm assuming there are not more then 10 or so. — BKSwindell, Dec 19 '18 at 12:14
@BKSwindell this is a graph problem. If, and only if there are no cycles, you could treat it as a hierarchy, and add a `hiearchyid` to represent families and relations. CTEs aren't slow - they are nothing more than subqueries that can also be used recursively. They are slow if the query is slow, if there are missing indexes etc. `hierarchyid` removes the need for recursion if you can model your relations as a DAG — Panagiotis Kanavos, Dec 19 '18 at 12:37
@BKSwindell you can use CTEs to generate the hierarchyid values needed to improve performance in this table. Once the migration is finished, you can update the `hierarchyid` column each time you modify a row — Panagiotis Kanavos, Dec 19 '18 at 12:38

score 0 · Answer 1 · answered Dec 19 '18 at 18:28

Here is my approach to your question (using CTE):

declare @tb table (Person varchar(10), Sibling1 varchar(10), Sibling2 varchar(10))
insert into @tb values
('Jason','Brad', 'Sheri'), ('Brad', 'Sheri','Jason'), ('Sheri','Brad', 'Tina'),
('Tina', 'Sheri','Sam'),   ('Sam',  'Kara', 'Tina'),  ('Kara', 'Sam',  'Tina'),
('James','Kelly',NULL),    ('Kelly','James',NULL),    ('Fred', NULL,   NULL)

with b
as (select
     Person
    ,Related = (select COUNT(*) from @tb where a.Person in (Sibling1, Sibling2))
    from @tb a
    )
select c.Person, Family = 
    case (select SUM(b.Related) from b where b.Person in (c.Person, c.Sibling1, c.Sibling2))
        when 0 then 3
        when 2 then 2
        else 1
    end
from @tb c

MSSQL Grouping by Nested Values

1 Answers1