3

I have some kind of a tree stored in a table. It has 2 key columns id and parent_id. And some abstract data for example name and mtime. Lets say this is a file system.
I can select all children or all parents from a paticular id. (Like described in this answer)

The question is how can I update(or delete) such a subtree?
For example I want to update the modification time of some node and all of it children (or node and all it's parents upto root). Or delete this node with children. What is the best approach in terms of performance? This table can be really large 100M+ or records.

DDL

create table test (
  id int not null primary key,
  parent_id int not null,
  name varchar(100),
  mtime timestamp default current_timestamp
);
insert into test(id, parent_id, name) values(1, 0, "row1");
insert into test(id, parent_id, name) values(2, 1, "row2");
insert into test(id, parent_id, name) values(3, 2, "row3");
insert into test(id, parent_id, name) values(4, 2, "row4");
insert into test(id, parent_id, name) values(5, 4, "row5");
insert into test(id, parent_id, name) values(6, 4, "row6");
insert into test(id, parent_id, name) values(7, 6, "row7");

What makes for us this tree:

row1
|
row2--row4--row5
|     |
row3  row6
      |
      row7

--- Update Try1 ---

Tried this as Vladimir suggested:

create procedure upd_test (start_id integer)
as
begin
    WITH RECURSIVE
    CTE (id)
    AS
    (
        SELECT T.id
        FROM test AS T
        WHERE T.id = :start_id

        UNION ALL

        SELECT T.id
        FROM
            test AS T
            INNER JOIN CTE ON CTE.id = T.parent_id
    )
    UPDATE test
        SET mtime = '2001-02-03 10:11:12'
        WHERE id IN (SELECT id FROM CTE);
end

got:

Invalid token.
Dynamic SQL Error.
SQL error code = -104.
Token unknown - line 19, column 5.
UPDATE.
Community
  • 1
  • 1
Dimanoid
  • 6,999
  • 4
  • 40
  • 55
  • 1
    Could you knock together a [sqlfiddle](http://sqlfiddle.com/) with the schema and some sample data? – Steph Locke Mar 02 '15 at 11:20
  • 1
    Added schema and data. sqlfiddle does not support firebird – Dimanoid Mar 03 '15 at 12:51
  • Traditional RDB's are not very good at storing recursive data. If your are really concerned with performance, consider using some graph-oriented DB alongside with RDB. You can use it as an "index". [Neo4j](http://neo4j.com/) is a good one. – Slava Fomin II Mar 03 '15 at 12:55
  • What DB do you suggest? Keep in mind that it should be freeware and even better opensource. – Dimanoid Mar 03 '15 at 15:21
  • You need to add parent_id to the first parent query and the second query. I believe you need to make sure parent_id is available from the two unioned queries before you can reference them in joins or where clauses due to the recursive nature. – justdan23 Jul 06 '23 at 22:27

4 Answers4

2

Make sure that you have indexes on id and on parent_id for this to work efficiently.

After reading docs on Firebird (http://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd25-select.html#langrefupd25-select-cte)

  • The maximum recursion depth is 1024 (so, you need to check if it is enough for your data)
  • When enclosed in parentheses, CTE constructs can be used as subqueries in SELECT statements, but also in UPDATEs, MERGEs etc.

UPDATE

I have installed the latest Firebird 2.5.3 on Windows 7 64bit to test the syntax.

Based on the above, the query to update timestamp of some node (for example, with ID = 4) and all of its children to some value (for example, to 2001-02-03 10:11:12) looks like this:

UPDATE TEST SET 
MTIME = '2001-02-03 10:11:12'
WHERE ID IN
(
    WITH RECURSIVE
    CTE (id)
    AS
    (
        SELECT T.id
        FROM test AS T
        WHERE T.id = 4

        UNION ALL

        SELECT T.id
        FROM
            test AS T
            INNER JOIN CTE ON CTE.id = T.parent_id
    )
    SELECT id FROM CTE
);

I checked and it worked as expected (rows with IDs 4, 5, 6, 7 have been updated).

DELETE

The same approach, i.e.:

DELETE FROM TEST
WHERE ID IN
(
    WITH RECURSIVE
    CTE (id)
    AS
    (
        SELECT T.id
        FROM test AS T
        WHERE T.id = 4

        UNION ALL

        SELECT T.id
        FROM
            test AS T
            INNER JOIN CTE ON CTE.id = T.parent_id
    )
    SELECT id FROM CTE
);

ran without syntax errors, but it deleted only one row with id = 4. I would call it a bug.

DELETE with temporary table

The following works correctly. Create a global temporary table in advance. The temporary is only data in the table, not the table itself, so it has to be created in advance and it will remain in the database. By default the data in such temporary table will be cleaned up upon transaction end.

CREATE GLOBAL TEMPORARY TABLE ToDelete
(id int not null primary key);

Insert results of the recursive CTE into the temporary table and then use it to delete found IDs from the main table. Make sure these two statements run inside the same transaction.

INSERT INTO ToDelete
WITH RECURSIVE
CTE (id)
AS
(
    SELECT T.id
    FROM test AS T
    WHERE T.id = 4

    UNION ALL

    SELECT T.id
    FROM
        test AS T
        INNER JOIN CTE ON CTE.id = T.parent_id
)
SELECT id FROM CTE
;

DELETE FROM TEST
WHERE ID IN (SELECT ID FROM ToDelete)
;

I checked, this worked as expected (rows with IDs 4, 5, 6, 7 have been deleted).

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • Can't get this to work. Updated the question with the code and error. – Dimanoid Mar 06 '15 at 12:53
  • I have figured out correct syntax for Firebird 2.5.3. See the updated answer. – Vladimir Baranov Mar 07 '15 at 10:18
  • Thanks a lot. Now it works perfect. Cant say much about efficiency yet compared to use of temp table. I will study this later and write here when we have bigder BD. – Dimanoid Mar 09 '15 at 12:43
  • @Dimanoid, you are welcome. Performance is hard to predict. You should generate large set of data and test how different variants perform on your system and hardware. There are at least two significantly different cases. 1) there are many shallow trees, i.e. number of updated rows is small 2) there are few deep trees, i.e. number of updated rows is large. Index helps to find rows to update, but it takes time to update the index itself. – Vladimir Baranov Mar 09 '15 at 21:45
1

As you said you are able to get all the Id's you want.

So one solution would be:

  1. Select all the IDs to be updated and store them to #tmpid table

  2. Update / Delete

    UPDATE t
    SET t.mtime = GETDATE()
    FROM dbo.TreeTable t INNER JOIN #tmpid i ON t.id = i.id
    
    DELETE t FROM dbo.TreeTable t INNER JOIN #tmpid i ON t.id =i.id
    

But: NOT TESTED! Please check if this is ok with your amount of data...

To reach best performance it's always necessary to have meaningful index:

CREATE UNIQUE CLUSTERED INDEX idx_treetable_id
  ON dbo.TreeTable(id);
CREATE UNIQUE INDEX idx_treetable_unique
    ON dbo.TreeTable(id,parent_id)
CREATE NONCLUSTERED INDEX idx_parent_id
  ON dbo.TreeTable(parent_id);
GO
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
CeOnSql
  • 2,615
  • 1
  • 16
  • 38
  • Unfortunately your answer is for SQL Server and doesn't work on Firebird; something similar could be done in Firebird, but would require a global temporary table, and use of the `MERGE` statement, as Firebird doesn't support joins in `UPDATE`. – Mark Rotteveel Mar 05 '15 at 17:14
  • @MarkRotteveel, I have figured out correct syntax for `UPDATE` (tested with Firebird 2.5.3). See my answer. – Vladimir Baranov Mar 07 '15 at 10:23
0

It's not exactly your solution, but you should try to make a recursive SELECT statement first. When you can SELECT all of the rows you wish, then you just change it to an UPDATE or a DELETE.

Here's an example of a recursive SELECT. To be able to fully test it, I would need extensive data so I guess it's best to let you try it on your own. It should be something like this.

With Recurs AS
( 
select id, name, parent_id
from test
union
select id, name, parent_id
from Recurs r
where id=r.parent_id
)
select *
from Recurs
order by id
option (maxrecursion 0)
Danielle Paquette-Harvey
  • 1,691
  • 1
  • 16
  • 31
  • Read the question first please. I said that I know how to select what I need and provided the link. – Dimanoid Mar 04 '15 at 07:08
0

If you are sure the depth of your tree is smaller than 1024, in MYSQL you could use the following procedure for this:

DELIMITER $$ 

CREATE PROCEDURE upd_test( IN start_id INT, IN str_date CHAR(20) )
BEGIN

    DECLARE next_id INT DEFAULT NULL ;

    UPDATE  test
    SET     mtime = str_date
    WHERE   id    = start_id;

    SELECT parent_id FROM test WHERE id = start_id INTO next_id;

    IF next_id > 0 
    THEN 
        CALL upd_test( next_id, str_date ); 
    END IF;

END$$

DELIMITER ;

Next, set the recursion depth to 1024 (which apparantly is the maximum value supported by Firebird as mentioned by Vladimir) and run your procedure.

set max_sp_recursion_depth = 1024;

Using the example you gave us, you can for instance update node 6, 4, 2 and 1 using the procedure with the given datetime:

CALL upd_test( 6, '2001-02-03 10:11:12' );

You can use the same approach for functions that start at a root and change all their children. And you can again use the same approach to create a delete procedure.

avk
  • 871
  • 1
  • 9
  • 22