2

I'm new in this domain and like to write an application managing genealogical data. My main concern is how to store and retreive these data from MySQL. I know that DB like Oracle are optimised for recursive queries, but maybe I can find an alternative solution to use MySQL which I undestand is not supporting "CONNECT" . PS. I know that there are thousands of existing Open source solutions, but consider that these data will be a limited part of the functionality, and I need to keep control of the full code.

I had a quick look on the web and found some interesting approach, for instance Interval-based algo which is perfect for queries but not satisfactory for update / deletion.

I'm going to have a look on Prefix-based(Dewey) approach, but one may know an efficient and proven approach to share ?

thanks

gilles

gilles
  • 45
  • 1
  • 4
  • the following answer may be helpful http://stackoverflow.com/questions/5291054/hierarchical-sql-problem/5291159#5291159 – Jon Black Jan 22 '12 at 15:15

2 Answers2

3

First problem, design data schema: I keep hierarchis with a foreign key to parent row. It is simply.

Second problem, retrieve ascendants/descendants: As you explain, problems comes with select: select some people and all descendants os ascendants. To solve this you should to create a new tree table. This table contains pairs: al combination to a person with all they ancestors (and itself):

people( id, name, id_parent)
people_tree( id, id_ancestor, distance )

Noticie that with this structure is easy to query hierarchies. Sample: all descendants of somebody:

select people.*, distance
from 
  people p
    inner join 
  people_tree t 
    on ( p.id = t.id)
where
  id_ancesor = **sombody.id **

You can play with distance to get only grandparents, grandwchildren, etc. ...

Last problem, keep tree: tree must be all time up to data. You should automatize this: a trigger over people or a store procedure for CRUD operations,

EDITED

Because this is a Genealogy tree, each person must to have both references, parent and mother:

people( id, name, id_parent, id_mother)

Then, 2 trees are needed:

parent_ancestors_tree( id, id_ancestor, distance )
mother_ancestors_tree( id, id_ancestor, distance )

David ask for sample data:

people: id    name    id_parent    id_mother
         1    Adam         NULL      NULL
         2    Eva          NULL      NULL
         3    Cain            1         2
        ..    ...
         8    Enoc            3         5

parent_ancestors_tree id    id_ancestor  distance
              (Adam)   1              1         0
              (Eva)    2              2         0
              (Cain)   3              3         0
                       3              1         1
              (Enoc)   8              8         0
                       8              3         1
                       8              1         2

mother_ancestors_tree id    id_ancestor  distance
              (Adam)   1              1         0
              (Eva)    2              2         0
              (Cain)   3              3         0
                       3              2         1
              (Enoc)   8              8         0
                  -- here ancestors of Enoc's mother --

regards.

dani herrera
  • 48,760
  • 8
  • 117
  • 177
  • I have one question : how do you segregate the different families (additional id_family attribute or always rebuild the tree, which means recursive access) ? gilles – gilles Jan 22 '12 at 17:41
  • @danihp: A sample would be nice. What do you mean with distance? What is an ancestor? – Micromega Jan 22 '12 at 17:41
  • @david, sorry about my english. Distance: 0 between a `person` and itself, 1 between a `person` and parent, 2 between a `person` and grandparent, ... . [Ancestor](http://dictionary.reference.com/browse/ancestor): a person from whom one is descended; forebear; progenitor. – dani herrera Jan 22 '12 at 17:53
  • @gilles, segregate families: nice question, my solution is for a simple hierarchy, a family is more complex because people has 2 progenitors, then each people should have 2 foreign keys (parent and mother), this add complexity to solution (perhaps 2 people_tree are needed for each people). You should avoid rebuild the tree, on insert people you should copy parents tree (changing id parent by new person `id`) and add both entries (for inserted `person` and it parent and for inserted `person`and itself). Nobody says that this will easy. – dani herrera Jan 22 '12 at 18:01
  • @david, sample data posted. regards. – dani herrera Jan 22 '12 at 19:51
  • thanks Danihp, quite precise answer... I thought about it and have 2 questions, why do you split the two tree tables ? it should work the same in a single one ? Since I'll need to consolidate the full tree (including mother and fathers) I'll certainly have to make a join ... Secondly, since I want to avoid the use of the DB for the tree construction (recursive cost in MySQL), and use instead the application layer, I think the Tree table contains sufficient info to rebuild it, do you agree? – gilles Jan 24 '12 at 21:33
  • I'll add also a treeId column to manage what I wrongly called "families" but are instead "unconnected sub-trees". In summary: one person table where I get person' data, and one tree table where I store each element relation (including the mother-father one). do you see any issue/risk ? – gilles Jan 24 '12 at 21:38
  • @guilles, first question: I purpose 2 tables: first one for parent and parent ancestors and second one for mother and mother ancestors. When you search for someone's ancestors you should do a [Union](http://dev.mysql.com/doc/refman/5.0/en/union.html) of both (it is easy). Second one: Make recursion in application layer it is also a possility. Because you have patents foreign keys you can do this easily. Also, it is possible reconstruct people_tree when you need. – dani herrera Jan 24 '12 at 21:43
  • Gilles, I don't know what kind of application are you designing: how many people (hundreds? milions of people?) How many requests by second? How many inserts / update / deletes by minute? How many levels of hierarchy you need to retrieve at a time? ... – dani herrera Jan 24 '12 at 21:43
  • "do you see any issue/risk ? " I think that is a bad idea, sorry. Each people has (may have) a parent and a mother, then it is a 1:N relationship between people (parent) and people (children) and 1:N relationship between people (mother) and people (children). But this is your design and your application. You know more about it than I. You should find a solution confortable to be managed by you. – dani herrera Jan 24 '12 at 21:47
  • @gilles, is this answer a solution for your issue? If this helps up vote answer and if it is a solution check answer as solution, do you know how to do this? – dani herrera Jan 25 '12 at 22:43
  • @danihp. Hi, yes it helped me. I'm trying a version partially based on your input. How can I vote for this ? – gilles Mar 15 '12 at 14:24
  • If this post is a solution for you then you should accpet it. Learn hot [accept an answer as solution](http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) . – dani herrera Mar 15 '12 at 14:33
1

I would also recommend an adjacent tree model and for the more complex logic I suggest to use simple mysql queries (Joins). Most likely it's more important to create the tree. You can do more data mining when the application is finished and everything went ok.

Micromega
  • 12,486
  • 7
  • 35
  • 72