1

I have database table where I want to get all children (nth level) of specified parent. For that, I have used answer from here which is working fine.

Here is code:

DELIMITER $$
CREATE PROCEDURE getParents(in_id INT)
  BEGIN
    DROP TEMPORARY TABLE IF EXISTS results;
    DROP TEMPORARY TABLE IF EXISTS temp2;
    DROP TEMPORARY TABLE IF EXISTS temp1;

    CREATE TEMPORARY TABLE temp1 AS
      SELECT DISTINCT * FROM agents WHERE upline_id = in_id;

    CREATE TEMPORARY TABLE results AS
      SELECT * FROM temp1;

    WHILE (SELECT count(*) FROM temp1) DO
      CREATE TEMPORARY TABLE temp2 AS
        SELECT DISTINCT *
        FROM agents
        WHERE upline_id IN (SELECT id FROM temp1);

      INSERT INTO results SELECT * FROM temp2;
      DROP TEMPORARY TABLE IF EXISTS temp1;

      CREATE TEMPORARY TABLE temp1 AS
        SELECT * FROM temp2;
      DROP TEMPORARY TABLE IF EXISTS temp2;

    END WHILE;   

    SELECT * FROM results;

  END $$
DELIMITER ;

Here is how I use it:

call getParents(2060);

Everything is fine but query runs slowly. Table also does not contain more than 10 thousand records.

Is there a way to optimize above stored procedure so that it runs a bit faster ?

I am new to MySQL so I cannot figure out how to optimize this stored procedure. Thanks for the help

Community
  • 1
  • 1
dev02
  • 1,776
  • 3
  • 24
  • 45
  • You have used so many query & query inside query....which will slow the process. – Dipanwita Kundu May 05 '16 at 07:56
  • Could you specify what you are looking for? Children and grandchildren and so on for a specific parent (which is what your code does when you put the parent in the "upline_id" and the child in "id" in your table)? The parents of a specific child (that was the problem in your link and is what your procedure name implies?) Or just the n-th-grade grandchildren of a specific parent (without all the children in between) (which was mentioned in your question but might not be meant since this "n" doesn't appear in your code)? – Solarflare May 05 '16 at 16:45
  • @Solarflare: Children of a parent but recursively – dev02 May 06 '16 at 05:39
  • @dev02 I updated my answer, you might want to try `engine=memory`, maybe it will solve your speed issues. – Solarflare May 07 '16 at 09:21

2 Answers2

4

Update: I probably forgot about the most likely optimization for you: I guess you don't use memory for temporary tables. For mysql >= 5.6 you can set default_tmp_storage_engine=MEMORY in your config (which will have the side effect of making you forget about it when you answer your next stack overflow question), or you can use engine = memory in your query if you have an earlier version or cannot or don't want to change your config. I updated the queries to use memory. If that really was your problem and you were using the hdd for your temp tables, then you might already be happy with your original code and an added engine=memory everywhere, since it will have the biggest effect of all I guess.

Creating and dropping temporary tables are expensive operations. A first optimization would be to create them once and then just to delete the contents, something like this:

DELIMITER $$

CREATE PROCEDURE getParents(in_id INT)
BEGIN

    drop table if exists temp1;
    drop table if exists temp2;
    drop table if exists results; 

    create temporary table temp2 engine=memory as (select id, upline_id from agents where upline_id = in_id); 
    create temporary table results engine=memory as (select id, upline_id from temp2); 
    create temporary table temp1 (id int, upline_id int) engine=memory;

    while (select count(*) from temp2) do 

        insert into temp1 (id, upline_id)
        select a.id, upline_id 
        from agents a
        where a.upline_id in (select id from temp2) ;

        insert into results (id, upline_id)
        select distinct id, upline_id
        from temp1;

        delete from temp2;

        insert into temp2 (id, upline_id)
        select distinct id, upline_id
        from temp1;

        delete from temp1;
    end while;    

    select a.* 
    from results r
    join agents a
    on a.id = r.id;

    drop table if exists temp1;
    drop table if exists temp2;
    drop table if exists results; 

End $$  

DELIMITER ;  

Next optimization could be to reduce the number of repetitions by joining ahead (makes the code look more complicated but should be faster) and thus doing several levels at once, and removing one temp table by saving the level n of the child:

DELIMITER $$

CREATE PROCEDURE getParents(in_id INT)
BEGIN

    set @n = 1;

    drop table if exists temp1;
    drop table if exists results; 

    create temporary table results (id int, upline_id int, n int) engine = memory;
    insert into results (id, upline_id, n) 
    select id, upline_id, @n from agents where upline_id = in_id; 

    create temporary table temp1 (id0 int, upline_id0 int, id1 int, upline_id1 int, 
                                  id2 int, upline_id2 int, id3 int, upline_id3 int) engine = memory;

    while (select count(*) from results where n = @n) do 

        insert into temp1 
        select a0.id as id0, a0.upline_id as upline_id0, 
        a1.id as id1, a1.upline_id as upline_id1, 
        a2.id as id2, a2.upline_id as upline_id2, 
        a3.id as id3, a3.upline_id as upline_id3
        from agents a0
        left outer join agents a1
        on a1.upline_id = a0.id
        left outer join agents a2
        on a2.upline_id = a1.id
        left outer join agents a3
        on a3.upline_id = a2.id
        where a0.upline_id in (select id from results where n = @n) ;

        insert into results (id, upline_id, n)
        select distinct id0, upline_id0, @n + 1
        from temp1
        where not id0 is null;
        insert into results (id, upline_id, n)
        select distinct id1, upline_id1, @n + 2
        from temp1
        where not id1 is null;
        insert into results (id, upline_id, n)
        select distinct id2, upline_id2, @n + 3
        from temp1
        where not id2 is null;
        insert into results (id, upline_id, n)
        select distinct id3, upline_id3, @n + 4
        from temp1
        where not id3 is null;

        set @n = @n + 4;

        delete from temp1;

    end while;    

    select a.* 
    from results r
    join agents a
    on a.id = r.id;

    drop table if exists temp1;
    drop table if exists results; 

End $$  

DELIMITER ;  

You can increase or decrease the number of joins. More joins will not necessarily be faster, since you might do unused joins if you have sparse data, so you might have to test a little. (It will depend on your data, the number of children per parent, and the depth you most often query from, but 3-4 might be a good starting point. You shouldn't make it too high and should test it for parents with a lot of children/grandchildren.)

But the fastest way to get your results would be to have a look at nested sets, Managing Hierarchical Data in MySQL. It's a little bit to read and to understand, but operations are way faster for nested sets (they are made for exactly your kind of problem in databases). You can have both structures at the same time in the same table (if you might need them at another place, so that's no argument against it), you just have to keep them up-to-date when you change your data. And, well, read a lot first. But it will be worth your while.

Solarflare
  • 10,721
  • 2
  • 18
  • 35
  • Am getting error `Sql Error (1136): Column count doesn't match value count at row 1` – dev02 May 06 '16 at 06:26
  • @dev02, yes it will, but so does James' solution. Could you add some example data for id/upline_id and what you expect as a result for a specific call? Or use James` example data and write down what you expect for `CALL getChildren(14);` as a result - mine would give the same results as his. We might have a different understanding of "n-level" or how you defined/understand `id` and `upline_id`. upline_id is the id of the (only) parent of a child (where 'child' is a child as seen in a tree strcuture - a "human" child might have two parents and several childs of his own - that's not meant here). – Solarflare May 06 '16 at 10:16
  • your solution gives me exact results and as many rows as there are in child parent relationship which is good but still little bit slow to be used in production. I don't understand james solution though because it does not match with columns of my table. thanks – dev02 May 06 '16 at 10:18
  • Can you please make it more faster or combine regex thing in your example or combined solution with James or something. I just want to get all children of given parent till n-th level similar to `WITH RECURSSIVE` thing in Postgres SQL – dev02 May 06 '16 at 10:21
  • @dev02: If the code you posted is the code that works for you, you just have to remove `data` in James' code and rename agentsZ to agents. Otherwise you need to post your table data. I proposed an improvement to his solution in his comments that will work without temporary tables, if you use this it will be faster than mine (at least if you dont have really big trees). Either wait for him to add it or combine it yourself (I'm not gonna "steal" his idea, and it is too long to post it completely as a comment). You can use real tables to improve speed though (the temp tables slow it down). – Solarflare May 06 '16 at 10:41
  • awesome `engine = memory;` does the job, fast now thank you very much :) – dev02 May 07 '16 at 13:49
2

(Note Solarflare suggested I got this recursion the wrong way around, see children from parents in my edit at the end.)

This is the fastest I could come up with, assuming each child has only one parent, I hope its along the right lines. Note that I changed the table name to agentsZ as there is a drop command at the beginning which will wipe out the original table if run without the Z. The reason for this is so that the stored procedure will run out of the box provided you change the table name to 'agents' and the data column name is replaced with the actual names of the columns you require (asterisk won't work).

Raw code:

# DROP TABLE IF EXISTS agentsZ;

CREATE TABLE agentsZ (id TINYINT UNSIGNED PRIMARY KEY, upline_id TINYINT UNSIGNED, `data` CHAR(8));

INSERT INTO agentsZ
VALUES (1, 4, 'A'),
(2, 3, 'B'),
(5, 8, 'C'),
(6, 7, 'D'),
(4, 9, 'E'),
(3, 9, 'F'),
(9, 12, 'G'),
(8, 11, 'H'),
(7, 10, 'I'),
(12, 13, 'J'),
(11, 14, 'K'),
(10, 14, 'L');

DELIMITER $

DROP PROCEDURE IF EXISTS getParents$

CREATE PROCEDURE getParents(in_id INT)
BEGIN

    SET @VUplineID := in_id;
    SELECT id, @VUplineID := upline_id upline_id, `data` FROM agentsZ WHERE id = @VUplineID;

END$

DELIMITER ;

CALL getParents(1);

Tested code:

mysql> DROP TABLE IF EXISTS agentsZ;
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql>
mysql> CREATE TABLE agentsZ (id TINYINT UNSIGNED PRIMARY KEY, upline_id TINYINT UNSIGNED, `data` CHAR(8));
Query OK, 0 rows affected (0.06 sec)

mysql>
mysql> INSERT INTO agentsZ
    -> VALUES (1, 4, 'A'),
    -> (2, 3, 'B'),
    -> (5, 8, 'C'),
    -> (6, 7, 'D'),
    -> (4, 9, 'E'),
    -> (3, 9, 'F'),
    -> (9, 12, 'G'),
    -> (8, 11, 'H'),
    -> (7, 10, 'I'),
    -> (12, 13, 'J'),
    -> (11, 14, 'K'),
    -> (10, 14, 'L');
Query OK, 12 rows affected (0.02 sec)
Records: 12  Duplicates: 0  Warnings: 0

mysql>
mysql> DELIMITER $
mysql>
mysql> DROP PROCEDURE IF EXISTS getParents$
Query OK, 0 rows affected (0.02 sec)

mysql>
mysql> CREATE PROCEDURE getParents(in_id INT)
    -> BEGIN
    ->
    -> SET @VUplineID := in_id;
    -> SELECT id, @VUplineID := upline_id upline_id, `data` FROM agentsZ WHERE id = @VUplineID;
    ->
    -> END$
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql> DELIMITER ;
mysql>
mysql> CALL getParents(1);
+----+-----------+------+
| id | upline_id | data |
+----+-----------+------+
|  1 |         4 | A    |
|  4 |         9 | E    |
|  9 |        12 | G    |
| 12 |        13 | J    |
+----+-----------+------+
4 rows in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Slightly less elegant, but to go the other way here is another function:

DELIMITER $

DROP PROCEDURE IF EXISTS getChildren$

CREATE PROCEDURE getChildren(in_id INT)
BEGIN

    SET @VBeforeRows := -1;
    SET @VAfterRows := 0;
    SET @VDownLineIDRegex := CONCAT('^', in_id, '$');

    WHILE @VAfterRows != @VBeforeRows DO

        SET @VBeforeRows := @VAfterRows;

        DROP TEMPORARY TABLE IF EXISTS ZResults;

        CREATE TEMPORARY TABLE ZResults
        SELECT id, upline_id, IF(@VDownLineIDRegex REGEXP CONCAT('\|\^', id, '\$'), @VLoop := FALSE, @VDownLineIDRegex := CONCAT(@VDownLineIDRegex, '|^', id, '$')) idRegex, `data` FROM agentsZ WHERE upline_id REGEXP @VDownLineIDRegex;

        SELECT COUNT(*) INTO @VAfterRows FROM ZResults;

    END WHILE;

    SELECT id, upline_id, `data` FROM ZResults;

END$

DELIMITER ;

CALL getChildren(14);

Here's a copy of my output when executed:

mysql> DROP TABLE IF EXISTS agentsZ;
Query OK, 0 rows affected (0.02 sec)

mysql>
mysql> CREATE TABLE agentsZ (id TINYINT UNSIGNED PRIMARY KEY, upline_id TINYINT UNSIGNED, `data` CHAR(8));
Query OK, 0 rows affected (0.04 sec)

mysql>
mysql> INSERT INTO agentsZ
    -> VALUES (1, 4, 'A'),
    -> (2, 3, 'B'),
    -> (5, 8, 'C'),
    -> (6, 7, 'D'),
    -> (4, 9, 'E'),
    -> (3, 9, 'F'),
    -> (9, 12, 'G'),
    -> (8, 11, 'H'),
    -> (7, 10, 'I'),
    -> (12, 13, 'J'),
    -> (11, 14, 'K'),
    -> (10, 14, 'L');
Query OK, 12 rows affected (0.02 sec)
Records: 12  Duplicates: 0  Warnings: 0

mysql>
mysql> DELIMITER $
mysql>
mysql> DROP PROCEDURE IF EXISTS getChildren$
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql>
mysql> CREATE PROCEDURE getChildren(in_id INT)
    -> BEGIN
    ->
    ->     SET @VBeforeRows := -1;
    ->     SET @VAfterRows := 0;
    ->     SET @VDownLineIDRegex := CONCAT('^', in_id, '$');
    ->
    ->     WHILE @VAfterRows != @VBeforeRows DO
    ->
    ->         SET @VBeforeRows := @VAfterRows;
    ->
    ->         DROP TEMPORARY TABLE IF EXISTS ZResults;
    ->
    ->         CREATE TEMPORARY TABLE ZResults
    ->         SELECT id, upline_id, IF(@VDownLineIDRegex REGEXP CONCAT('\|\^', id, '\$'), @VLoop := FALSE, @VDownLineIDRegex := CONCAT(@VDownLineIDRegex, '|^', id, '$')) idRegex, `data` FROM agentsZ WHERE upline_id REGEXP @VDownLineIDRegex;
    ->
    ->         SELECT COUNT(*) INTO @VAfterRows FROM ZResults;
    ->
    ->     END WHILE;
    ->
    ->     SELECT id, upline_id, `data` FROM ZResults;
    ->
    -> END$
Query OK, 0 rows affected (0.01 sec)

mysql>
mysql> DELIMITER ;
mysql>
mysql> CALL getChildren(14);
+----+-----------+------+
| id | upline_id | data |
+----+-----------+------+
|  5 |         8 | C    |
|  6 |         7 | D    |
|  7 |        10 | I    |
|  8 |        11 | H    |
| 10 |        14 | L    |
| 11 |        14 | K    |
+----+-----------+------+
6 rows in set (0.13 sec)

Query OK, 0 rows affected (0.13 sec)

Regards,

James

James Scott
  • 1,032
  • 1
  • 10
  • 17
  • `SET @VUplineID := 1;` should be `SET @VUplineID := in_id;`. But as far as I understood the original procedure, I expected `getParents` to actually mean `getChildren`, since it traverses downwards, but I'm not so sure anymore... – Solarflare May 05 '16 at 16:35
  • Thanks for the pointer! I'll change that strait away. Hopefully dev02 can clarify the direction for us. – James Scott May 05 '16 at 16:38
  • I like last procedute but it finds children to one level only whereas it should be nth level. thanks – dev02 May 06 '16 at 06:37
  • It should go to n levels, let me check it over – James Scott May 06 '16 at 08:01
  • I can't reproduce the 1 level issue, see output at my end when executed, can you reproduce these results when running the same code? – James Scott May 06 '16 at 08:10
  • @James Scott: nice regexp, I'll gonna use this idea sometimes :-) You can get rid of some temp table creations: `CREATE TEMPORARY TABLE ZResults (id TINYINT UNSIGNED, upline_id TINYINT UNSIGNED, idRegex text, ``data`` char(8));` before `while`, `delete from ZResults` inside while and `insert id, ... select` instead of `create temporary table` inside `while`, since the temp tables are the thing that are slowing down the code. For small trees it should be faster than my code, since it uses less temp tables then. – Solarflare May 06 '16 at 09:51
  • Thanks Solarflare, your right about the temp tables, the optimal solution would be a combination of both solutions. – James Scott May 06 '16 at 09:54
  • @James Scott: I played with your code again, you can get rid of the temp tables completely: `select count(*) into @VAfterRows from (SELECT id, upline_id, ... ) as a` instead of `CREATE TEMPORARY TABLE ZResults select id, ...` inside the `while`, and `SELECT id, upline_id, ``data`` FROM agentsZ WHERE upline_id REGEXP @VDownLineIDRegex;` as the last select. I guess I have to think about regexps next time I have a problem that needs a temp table (though i guess it will slow down when you have large trees). Really nice idea! – Solarflare May 06 '16 at 10:07
  • sorry, I don't understand your examples,why new table and why only three fields, cannot get this to work – dev02 May 06 '16 at 10:16
  • Hi dev02, there are three fields because I don't know your table structure, so I guessed and used `data` to represent whatever else is in the table. This code is efficient but won't work by placing asterisks in the code. You need to name each field you want in the end result. to do this, wherever you see `data` in the code replace it with the fields you would like returned from your table. (lines 15 and 20 of the procedure). – James Scott May 06 '16 at 10:26
  • I tested it your solution returns 38 rows whereas they should be 436 so may be it is not working recurssively – dev02 May 06 '16 at 11:02
  • Can you run it again, then run this `SELECT CHAR_LENGTH(@VDownLineIDRegex);` and post the results? – James Scott May 06 '16 at 11:06
  • `SELECT CHAR_LENGTH(@VDownLineIDRegex);` gives `6` whereas orginal stored procedure seems to only work for first straight children not to nth level becaus it returns 36 rows instead of 400+ rows. thanks – dev02 May 06 '16 at 19:50