7

Supose the families bellow:

Family Tree

The Build Schema of this is:

create table PersonConn (child int, parent int)
insert into PersonConn values (1,2)
insert into PersonConn values (1,3)
insert into PersonConn values (5,3)
insert into PersonConn values (5,4)
insert into PersonConn values (6,7)
insert into PersonConn values (6,8)
insert into PersonConn values (2,9)
insert into PersonConn values (2,10)
insert into PersonConn values (3,11)
insert into PersonConn values (3,12)

To get the ancestors of a family member I can use recursion as showed bellow:

WITH Childs AS (
    SELECT distinct Child, Parent
    FROM  PersonConn
    WHERE Child = 1
    UNION ALL
    SELECT t2.Child, t2.Parent
    FROM   [Childs] t1
    INNER JOIN  PersonConn t2
        ON  t2.Child = t1.parent
)
SELECT PARENT FROM CHILDS

SQL Fiddle

It will take all the ancestors of selected member (ID 1 in this example), but not the brothers for example. The query goes up only in family tree.

My question is:

How to get all members of a family (sons, parents, grandfathers, uncles, cousins, etc...) starting from a single person?

UPDATE

One method to solve this is a loop that inserts a person in a temporary table. After you could join PersonConn table with this temporary table and inserts other people. Do this until no one is inserted anymore. I am looking for a more efficient (and elegant) way. I have about 200MM records in PersonConn table.

Nizam
  • 4,569
  • 3
  • 43
  • 60
  • You would need to insert the root in PersonConn... Instead of selecting where CHILD = 1, you would do where PARENT IS NULL. That would give you all the members, starting from the root downwards. – Mez Jul 22 '14 at 22:50
  • Actually I want to choose a person ID (child or parent) and get all the family members connected to that person. I don't have a single root person with members bellow. I have a bunch of persons connected as parents and childs. Observe the given example I have 6 persons with no parents informed. – Nizam Jul 22 '14 at 22:56
  • You can specify the tree structure of the family - and you'd have 2 leaves having 1 parent. However how the leaves relate to each other - you would have to specify the relationship in another column. You can assume they are brothers/sisters because they are on the same level - you can calculate the depth in the CTE, however I would suggest in your case to specify the relationship. – Mez Jul 22 '14 at 23:11
  • This question briefs out what I am saying exactly - http://stackoverflow.com/questions/1781402/sql-recursive-query-to-identify-relation-between-2-users-of-family-tree – Mez Jul 22 '14 at 23:12
  • I just want to separate all persons in families. To do this I need to take a person who is not in any family and construct a family for him. Take another person and construct his family. Till I have given a family for every person in database. I don't need to specify the relationship between them. – Nizam Jul 22 '14 at 23:40

4 Answers4

1

The solution I found is not good at all. It gives the right answer but is very slow, even for this very small table.

 DECLARE @INCLUIDOS TABLE (ID INT)

 INSERT INTO @INCLUIDOS VALUES(1)

 DECLARE @PAST_QUANT INT = 0
 DECLARE @QUANT INT = 1 

 WHILE @QUANT <> @PAST_QUANT
 BEGIN

     SET @PAST_QUANT = @QUANT

     INSERT INTO @INCLUIDOS
        SELECT PARENT 
        FROM PERSONCONN 
        WHERE CHILD IN (SELECT ID FROM @INCLUIDOS)
            AND PARENT NOT IN (SELECT ID FROM @INCLUIDOS)

    INSERT INTO @INCLUIDOS
        SELECT CHILD
        FROM PERSONCONN
        WHERE PARENT IN (SELECT ID FROM @INCLUIDOS)
            AND CHILD NOT IN (SELECT ID FROM @INCLUIDOS)

    SET @QUANT = (SELECT COUNT(*) FROM @INCLUIDOS)

END

SELECT DISTINCT ID FROM @INCLUIDOS

SQL Fiddle

Nizam
  • 4,569
  • 3
  • 43
  • 60
1

In first I suggest you that use hierarchyid column for your table.

Try following query (without hierarchyid):

DECLARE @PersonId INT = 3

;WITH Parents AS (
    SELECT @PersonId AS Id
    UNION ALL
    SELECT child
    FROM PersonConn pc
    INNER JOIN Parents p ON pc.parent = p.Id
    ),
    Childs AS (
    SELECT distinct pc.Child, pc.Parent
    FROM  PersonConn pc
    INNER JOIN Parents p ON pc.child = p.Id OR pc.parent = p.Id
    UNION ALL
    SELECT t2.Child, t2.Parent
    FROM   [Childs] t1
    INNER JOIN  PersonConn t2
        ON  t2.Child = t1.parent
)
SELECT DISTINCT CASE WHEN N.n=1 THEN parent ELSE child END 
FROM CHILDS
CROSS APPLY(SELECT 1 UNION SELECT 2)N(n)

SQL Fiddle

Nizam
  • 4,569
  • 3
  • 43
  • 60
mehdi lotfi
  • 11,194
  • 18
  • 82
  • 128
  • Brilliant. Before accepting it I will try on my real data. It is a very good solution. Congratulations!!! – Nizam Jul 23 '14 at 04:33
  • The query is very nice but it doesn´t work. It takes all direct descenders and then all ancestors of these people. But, it no consider the descenders of these new ancestors. This way, if you start from @PersonID = 10, it will takes 10,2,1 with Parents, add 9,3,11,12 with Childs, but will not consider 4 and 5. This problem IS really hard. That is why I will not accept the answer. – Nizam Jul 23 '14 at 14:43
  • Actually, it only works with @PersonId equals 3. Exactly the origin you have chosen (coincidence?). – Nizam Jul 23 '14 at 14:53
0

This should work given any node. The core CTE's only work if you begin with a root child node. So the first part finds a root child if the starting person is not one. The technique is to walk up the hierarchy, then down, then back up to get everyone in a family.

DECLARE @PersonId INT = 10

-- if id passed in is not a root child, then get one
If (SELECT Top 1 Parent FROM PersonConn WHERE Child = @PersonId) is null
   WITH CHILDS AS (
      SELECT Child, 0 as [level]
      FROM PersonConn
      WHERE Parent = @PersonId
      UNION ALL
      SELECT t2.Child, [level] + 1
      FROM CHILDS t1
      INNER JOIN  PersonConn t2
      ON  t2.Parent = t1.Child
      )
   SELECT Top 1 @PersonId = Child FROM CHILDS ORDER BY [level] Desc;

WITH CHILDS AS (
   SELECT Child, Parent
   FROM PersonConn
   WHERE Child = @PersonId
   UNION ALL
   SELECT t2.Child, t2.Parent
   FROM CHILDS t1
   INNER JOIN  PersonConn t2
   ON  t2.Child = t1.parent  
   ),
PARENTS AS (
   SELECT Child, Parent
   FROM PersonConn
   WHERE Parent in (Select parent from CHILDS) 
   UNION ALL
   SELECT t2.Child, t2.Parent
   FROM PARENTS t1
   INNER JOIN PersonConn t2
   ON t2.parent = t1.child
   ),
CHILDS2 AS (
   SELECT  Child, Parent
   FROM  PersonConn
   WHERE Child in(Select child From Parents)
   UNION ALL
   SELECT t2.Child, t2.Parent
   FROM   CHILDS2 t1
   INNER JOIN  PersonConn t2
   ON  t2.Child = t1.parent 
)
SELECT DISTINCT Parent, Child  FROM CHILDS2

I certainly don't think this is elegant, and I can't imagine it will perform well. I would be interested in knowing how well it performs on your volume of data. Not sure exactly how you are going to use this in production, but I recommend creating another field and populate it to identify entire families if this is something you have to do frequently.

Luke
  • 93
  • 5
  • I would love to be able to do it, but I don´t know a priori the family of the persons, neither the people who inserts the data. The insertion process is **VERY** distributed. I will test your query and will feedback later. Thanks a lot. – Nizam Jul 23 '14 at 20:50
  • The problem of this solution, as the above one, is that you are considering that going down and up a couple of times will solve the problem. I can always construct a family which this algorithm will not capture it entirely. Of course one algorithm like yours will work 90% of times, but it still will fail 10%. – Nizam Aug 12 '14 at 15:19
  • I would try and fall back to your July 23rd answer. I simplified in another answer. Performs well for me with a small table. – Luke Aug 13 '14 at 19:06
0

Here is a simplified version of Nizam's July 23rd answer. With the PersonConn table properly indexed, I get very good results (however I have no way to test it with 200 million records). If a temp table were used instead of a table variable you could index ID, but I don't think this would improve performance much because the index would need updating after each insert.

DECLARE @INCLUIDOS TABLE (ID INT)
INSERT INTO @INCLUIDOS VALUES(5)

WHILE @@ROWCOUNT <> 0
BEGIN

INSERT INTO @INCLUIDOS
    SELECT CHILD
    FROM PERSONCONN
    WHERE PARENT IN (SELECT ID FROM @INCLUIDOS)
        AND CHILD NOT IN (SELECT ID FROM @INCLUIDOS)

INSERT INTO @INCLUIDOS
    SELECT PARENT 
    FROM PERSONCONN 
    WHERE CHILD IN (SELECT ID FROM @INCLUIDOS)
        AND PARENT NOT IN (SELECT ID FROM @INCLUIDOS)
END

SELECT  ID FROM @INCLUIDOS
Luke
  • 93
  • 5