2

I have a database of dogs. Each dog has a sire and dam parent. I am not able to use CTE for this due to an issue with phpmyadmin and MariaDB 10.0.

I updated to MariaDB 10.2.20 to use CTE. Still getting "Unrecognized Statement type. (near WITH) in phpMyAdmin

Table is animal

Columns I am working with are: id, akc_reg_num, akc_parent_sire, akc_parent_dam

I have tried to get just one generation as I would for loading subcategories, but this is only one side when there are two parents.

I have attempted this solution, but can't get my head wrapped around the two parents of each sibling. How do I select only one generation of a hierarchical tree using an table parent child relation with SQL?

Example Table

CREATE TABLE  `animal` ( 
    `id` INT(11) NOT NULL AUTO_INCREMENT ,
     `akc_reg_num` VARCHAR(20) NOT NULL ,
     `akc_parent_sire` VARCHAR(20) NOT NULL ,
     `akc_parent_dam` VARCHAR(20) NOT NULL ,
     PRIMARY KEY (`id`)
) ENGINE = MyISAM;

INSERT INTO `animal` (`id`, `akc_reg_num`, `akc_parent_sire`, `akc_parent_dam`) VALUES
(NULL, '1', '2', '3'), 
(NULL, '2', '5', '6'), 
(NULL, '3', '9', ''), 
(NULL, '5', '', ''), 
(NULL, '6', '7', '8'), 
(NULL, '7', '', ''), 
(NULL, '8', '', ''), 
(NULL, '9', '10', '11'), 
(NULL, '10', '', ''), 
(NULL, '11', '12', ''), 
(NULL, '12', '', '');

Code:

include_once("db_conx.php");   

function getPedigree($node) { 
    // look up the parent of this node  
    $sql =  'SELECT akc_parent_sire, akc_parent_dam FROM animals WHERE akc_reg_num="'.$node.'";';
    $query = $db->prepare($sql);          
    $query->execute();
    $path = array();    
    while($row=$query->fetch(PDO::FETCH_ASSOC)){    
        if ($row['akc_parent_sire']!='') { 
            $path[] = $row['akc_parent_sire']; 
            echo $row['akc_parent_sire'];
            $path = array_merge(getPedigree($row['akc_parent_sire']), $path); 
        } 
        if ($row['akc_parent_dam']!='') { 
            $path[] = $row['akc_parent_dam']; 
            echo $row['akc_parent_dam'];
            $path = array_merge(getPedigree($row['akc_parent_dam']), $path); 
        } 
    }       
    return $path; 
} 
print_r(getPedigree('vvv'));

I need to loop through each generation so I can return a json array and then use javascript to tie the results to DOM elements. I only need to query for 4 generations however my concern is the expense of cpu cycles. Once I have a few hundred thousand animals in the database how efficient will this same query be?

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
user1956040
  • 111
  • 2
  • 12
  • 1
    Possible duplicate of [How to create a MySQL hierarchical recursive query](https://stackoverflow.com/questions/20215744/how-to-create-a-mysql-hierarchical-recursive-query) – Raymond Nijland Mar 24 '19 at 19:07
  • Thanks Raymond. I did in fact attempt this last night but it only has one parent in the row. I have two parents in the row. – user1956040 Mar 24 '19 at 19:43
  • Graph Databases exist for a reason. –  Mar 24 '19 at 22:14
  • @user1956040 You have not specified your EXACT desired output for your sample data. – mickmackusa Mar 24 '19 at 23:33
  • The way I am approaching this is as I move up the generations I can add onto the json array with where I am with other needed demographics for the DOM. For example I start off with offspring but move to the sire and dam. I will use 's' for sire and 'd' for dam . 2nd generation I will identify the location/breadcrumbs with an 's' and 'd'. 3rd generation will amend their sex to the breadcrumb and 'ss' and 'sd' for those related to the sire and 'ds' and 'dd' for those relations to the dam. 5th generation goes from 'ssss' ... 'dddd', 16 possible combinations in that generation. – user1956040 Mar 24 '19 at 23:57
  • Please only add question details to your question. Future researchers will not want to comb through the comments to get the "full story". Please clarify your question and risk gaining upvotes. – mickmackusa Mar 25 '19 at 00:40
  • Let's see the `WITH` that is giving you trouble. And, did you try the query from the commandline tool `mysql` to check whether the problem is with phpmyadmin or MariaDB? – Rick James May 07 '19 at 04:29
  • I can actually run the command from command line. There is an issue with phpmyadmin. – user1956040 May 08 '19 at 20:14

1 Answers1

2

To prevent abusing your database with iterated calls, SELECT the whole table just once and let php do all of the recursive work on the result set.

AMENDMENT: Since collecting the ~100,000 rows is too much heavy lifting, here is alternative advice... Rather than making up to 31 individual trips to the database within a recursive process, I'll recommend that you build a filtered array based on up to 5 trips to the database.

The following snippet is NOT tested:

$generation = 1;
$needles = [1];
$animals = [];
while ($needles && $generation < 6) {
    $sth = $db->prepare("SELECT * FROM animals WHERE akc_reg_num IN (" . implode(',', array_fill(0, count($needles), '?')) . ")");
    $sth->execute($needles);
    if ($results = $sth->fetchAll(\PDO::FETCH_ASSOC)) {
        $needles = array_filter(array_merge(array_column($results, 'akc_parent_sire'), array_column($results, 'akc_parent_dam')));
        $animals[] = array_merge($animal, $results);
    } else {
        $needles = null;
    }
    ++$generation;
}
// $animals is ready to pass to the php recursion

From an $animals result set like this:

$animals = [
    ['id' => 1, 'akc_reg_num' => 1, 'akc_parent_sire' => 2, 'akc_parent_dam' => 3],
    ['id' => 2, 'akc_reg_num' => 2, 'akc_parent_sire' => 5, 'akc_parent_dam' => 6],
    ['id' => 3, 'akc_reg_num' => 3, 'akc_parent_sire' => 9, 'akc_parent_dam' => 0],
    ['id' => 4, 'akc_reg_num' => 5, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0],
    ['id' => 5, 'akc_reg_num' => 6, 'akc_parent_sire' => 7, 'akc_parent_dam' => 8],
    ['id' => 6, 'akc_reg_num' => 7, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0],
    ['id' => 7, 'akc_reg_num' => 8, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0],
    ['id' => 8, 'akc_reg_num' => 9, 'akc_parent_sire' => 10, 'akc_parent_dam' => 11],
    ['id' => 9, 'akc_reg_num' => 10, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0],
    ['id' => 10, 'akc_reg_num' => 11, 'akc_parent_sire' => 12, 'akc_parent_dam' => 0],
    ['id' => 11, 'akc_reg_num' => 12, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0]
];

Itemized tasks:

  • Search your array for the row of the targeted akc_reg_num, then remove that row from the "haystack" to prevent the possibility of infinite recursion, then break the search loop for best efficiency
  • If there is no matching akc_reg_num in the haystack, return an empty array
  • If there is a matching akc_reg_num, loop the haystack and recurse all found parents. I am filtering out "deadends" to keep the result array small and clean.
  • If both parents are found in a given generation, break the loop to prevent needless iterations.
  • The recursion should continue until the generation count exceeds 4 or there are no more parents to collect.

Code: (Demo)

function buildPedigree($haystack, $akc_reg_num, $generation = 0) {
    ++$generation;
    foreach ($haystack as $index => $row) {
        if ($row['akc_reg_num'] == $akc_reg_num) {
            $result = ['sire' => $row['akc_parent_sire'], 'dam' => $row['akc_parent_dam']];
            unset($haystack[$index]);             // reduce the haystack to improve efficiency and avoid infinite loop
            break;                                // stop searching
        }
    }
    if (!isset($result)) {
        return [];  // $akc_reg_num not found
    }

    foreach ($haystack as $row) {
        if ($row['akc_reg_num'] == $result['sire']) {
            $result['sire_parents'] = array_filter(buildPedigree($haystack, $row['akc_reg_num'], $generation));  // recurse and purge empty parent arrays
            if (array_key_exists('dam_parents', $result)) {
                break;  // both parents found in generation, stop this loop
            }
        } elseif ($row['akc_reg_num'] == $result['dam']) {
            $result['dam_parents'] = array_filter(buildPedigree($haystack, $row['akc_reg_num'], $generation));  // recurse and purge empty parent arrays
            if (array_key_exists('sire_parents', $result)) {
                break;  // both parents found in generation, stop this loop
            }
        }
    }
    return $generation <= 4 ? $result : [];
} 

var_export(buildPedigree($animals, 1));

Output:

array (
    'sire' => 2,
    'dam' => 3,
    'sire_parents' => array (
        'sire' => 5,
        'dam' => 6,
        'dam_parents' => array (
            'sire' => 7,
            'dam' => 8,
        ),
    ),
    'dam_parents' => array (
        'sire' => 9,
        'sire_parents' => array (
            'sire' => 10,
            'dam' => 11,
            'dam_parents' => array (
                'sire' => 12,
            ),
        ),
    ),
)
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • 1
    That is the output I was looking for. I am not able to select * from animal to create a result set due to the fact I will soon have over 100k animals. When does it make sense to just hit the sql server with multiple calls rather then create a large result set? I have about 8 demographics with each of these dogs. Also I am not able to change from varchar to int due to real registration numbers having alphanumeric characters. Just wanted to keep it simple. – user1956040 Mar 25 '19 at 01:02
  • A full pedigree will have 30 dogs. – user1956040 Mar 25 '19 at 01:06
  • I think if you expect to be generating this 4-gen data every time a user loads a page, you are going to be better served by creating a separate table called `pedigree` containing only `id` (which is `animal`.`id`) and `pedigree` (which is the json string containing the generated 4-gen array). This can be a procedure that is run as soon as a new dog hits your system and never again because a dog can't change its ancestory. I don't currently have time to refactor my answer to use recursive queries, let me know if you get stuck trying to refactor it. – mickmackusa Mar 25 '19 at 01:10
  • If you get a new dog that belongs to a pre-existing pedigree "chain", you'll just need to repair the existing affiliated pedigree chains. – mickmackusa Mar 25 '19 at 01:12
  • I cant think you enough. Spent three days trying different approaches settling on the php solution above that I could not get to work but it looked correct. Again thanks for your help. – user1956040 Mar 25 '19 at 01:23
  • Do you know how to mark an answer as "helpful"? You are not meant to leave thank you comments. – mickmackusa Mar 25 '19 at 01:24
  • You can actually avoid recursive queries and perform a maximum of 5 trips to the database like this: https://www.db-fiddle.com/f/ePTSJpyBAGBXQ6q1WEL996/0 Then you will have built up the same amount of data in my `$animals` array by merging the result sets. Using recursive queries will lead to more total trips to the db. Do you know how to "connect the dots" with what I am saying? – mickmackusa Mar 25 '19 at 01:31
  • Making upto 5 total trips to the database is far, far better than (if my calculations are correct) upto 31 trips to the database. – mickmackusa Mar 25 '19 at 01:50
  • I've added an untested 5-gen query process to my answer. – mickmackusa Mar 25 '19 at 02:14
  • Do you know how to "connect the dots" with what I am saying? Yes I can see where you are going. I marked this as a solution. More than helpful again. Probably save my marriage. I was getting cranky. :) – user1956040 Mar 25 '19 at 02:28
  • I have lots of rep points, so I am not "grubbing for more", but I do want to explain that you have 2 "powers" as the question asker. 1. you can upvote ANY answer that you find "helpful" (this is how you "thank" one or more volunteers who offered you support. 2. If you receive an answer that resolves your issue, you can award the green tick to the single, best resolving answer. If an answer resolves your issue AND you find it to be helpful, you can both "upvote" (+10rep) and "accept" (+15rep) the same answer -- this delivers 25 points to the volunteer that delivered satisfactory support. – mickmackusa Mar 25 '19 at 02:52
  • 1
    Done and done. Brilliant on the five queries. It actually helps with all of the back end work I need to do and easier on the CPU. – user1956040 Mar 25 '19 at 03:00
  • One last thing... be sure to adjust your table structure so that `akc_reg_num` is a `UNIQUE KEY` (in addition to `id` being `PRIMARY KEY`). This should help performance. – mickmackusa Mar 25 '19 at 03:16
  • Testing php code above I am missing the last iteration: array ( 0 => '2', 1 => '3', )array ( 0 => '5', 1 => '9', 2 => '6', )array ( 1 => '7', 2 => '10', 4 => '8', 5 => '11', ) – user1956040 Mar 25 '19 at 18:44
  • Could be a few things... 1. I am using `array_filter()` in my recursive function, so if the final step returns empty/zeroish/falsey values, they will get removed. 2. Did I get the initial db calls right? Do you need 6 sets of results? If it is neither of those things, I'll need you to provide a realistic sample of data so that I can replicate the actual issue. – mickmackusa Mar 25 '19 at 21:02
  • I ended up using the proof of concept of what you provided. Very happy with how it is going together and learned some new tricks for the tool box. Thought it would do the board good to know that I did test your code. – user1956040 Mar 25 '19 at 23:29
  • If my code isn't perfect, I would actually like to know exactly how it fails because I want future researchers to trust my posts. Thanks. – mickmackusa Mar 25 '19 at 23:32