4

I've got a database that manages files - some files contain/reference other files, and my goal is to design a query that can give me the whole "tree" for a given document.

For example the structure may look like this:

  • File 1
    • File 2
    • File 3
    • File 4
    • File 5
      • File 6
      • File 7
      • File 8
        • File 9
        • File 10

etc., where File 1 effectively contains all of the files following it

These are broken out in my database between two tables - lets call them the "Files" table and the "References" table

The "Files" table has information about the files themselves - FileID, Filename, etc.

The "References" table shows the relationship of the above structure using the FileIDs of the files. My issue is that, for example, File 6 is not referenced by File 1 - it is only referenced by File 5.

e.g.:

[ParentFileID]  [ChildFileID]

1               2
1               3
1               4
1               5
5               6
5               7
5               8
8               9
8               10

Ideally I'd like to be able to check the position in the entire structure for any given FileID I pass in

Any ideas? I've been reading up on CTEs and it feels like some sort of recursive common table expression is what I'm after, though every example I can find is using one table and involves NULLs to track down the top level elements.

McFixit
  • 437
  • 3
  • 15
  • Possible duplicate of [Hierarchical Queries in SQL Server 2005](http://stackoverflow.com/questions/235515/hierarchical-queries-in-sql-server-2005) – Jakub Kania Feb 03 '16 at 13:49

1 Answers1

1

Yes, it can be done using a recursive CTE.

USE tempdb
GO
CREATE TABLE files
(
    [file_id] int PRIMARY KEY,
    [file_name] varchar(128) NOT NULL
);
INSERT INTO files VALUES
(1, 'File 1'),
(2, 'File 2'),
(3, 'File 3'),
(4, 'File 4'),
(5, 'File 5'),
(6, 'File 6'),
(7, 'File 7'),
(8, 'File 8'),
(9, 'File 9'),
(10, 'File 10');


CREATE TABLE [references]
(
    parent_file_id int NOT NULL,
    child_file_id int NOT NULL,
    PRIMARY KEY (child_file_id) 
);
INSERT INTO [references] VALUES
(1, 2),
(1, 3),
(1, 4),
(1, 5),
(5, 6),
(5, 7),
(5, 8),
(8, 9),
(8, 10);
GO

CREATE FUNCTION dbo.get_file_with_path(@file_id int)
RETURNS TABLE
AS
RETURN WITH h
AS
(
    SELECT 
        f.file_id, f.file_id as child_file_id, 
        f.file_name, 0 as reverse_level, 
        CAST( '/' + f.file_name as varchar(8000)) as path
    FROM 
        dbo.files f

    WHERE
        f.file_id = @file_id

    UNION ALL

    SELECT 
        h.file_id, r.parent_file_id as child_file_id,
        h.file_name, h.reverse_level + 1 as reverse_level,
        CAST('/' + f.file_name + h.path as varchar(8000)) as path
    FROM
        h 
        INNER JOIN  [references] r
            ON h.child_file_id = r.child_file_id
        INNER JOIN dbo.files f
            ON f.file_id = r.parent_file_id
)
SELECT TOP(1) h.file_id, h.file_name, h.path 
FROM h 
ORDER BY h.reverse_level DESC;
GO

SELECT *
FROM dbo.get_file_with_path(1)
UNION ALL
SELECT *
FROM dbo.get_file_with_path(3)
UNION ALL
SELECT *
FROM dbo.get_file_with_path(6)
UNION ALL
SELECT *
FROM dbo.get_file_with_path(10)

Output:

| file_id | file_name |             path              |
|---------|-----------|-------------------------------|
|       1 | File 1    | /File 1                       |
|       3 | File 3    | /File 1/File 3                |
|       6 | File 6    | /File 1/File 5/File 6         |
|      10 | File 10   | /File 1/File 5/File 8/File 10 |

I assume you mean path when you say position

EDIT:

Answering the question in comments, you can also create a table valued function that returns the sub-tree below a given node:

CREATE FUNCTION dbo.get_file_subtree_excluding_self(@file_id int)
RETURNS TABLE
AS RETURN
WITH h AS
(
    SELECT r.parent_file_id, r.child_file_id
    FROM [references] r
    WHERE r.parent_file_id = @file_id

    UNION ALL

    SELECT r.parent_file_id, r.child_file_id
    FROM
        h INNER JOIN [references] r
            ON h.child_file_id = r.parent_file_id

)
SELECT h.child_file_id as [file_id]
FROM h
GO

SELECT * FROM dbo.get_file_subtree_excluding_self(5)

Output:

+---------+
| file_id |
+---------+
|       6 |
|       7 |
|       8 |
|       9 |
|      10 |
+---------+

The table references describes a graph. One node can have only one parent because of the primary key, but nothing prevents cycles. For example, consider the following data:

+-------+--------+
| child | parent |
+-------+--------+
|     1 |      2 |
|     2 |      3 |
|     3 |      1 |
+-------+--------+

As you can see, there are cycles on this graph.

Jesús López
  • 8,338
  • 7
  • 40
  • 66
  • 1
    This is pretty great. Is there also a way that I could send in an ID and get all of the IDs back of the files within it? i.e. if i sent in the ID for file 5, it would return 6, 7, 8, 9 and 10? – McFixit Feb 04 '16 at 13:58
  • 1
    the function in the edit always appears to hit the maximum recursion limit before statement completion regardless of what I set the limit to be (and allowing for infinite loops runs indefinitely) – McFixit Feb 04 '16 at 18:26
  • I'm noticing now that some do work, so I'm assuming there's some kind of circular reference issue in the database – McFixit Feb 04 '16 at 18:35
  • Yes, there are circular references. Note the DDL of references table, it has `child_file_id`as the primary key, this prevent circular references. Without that primary key you don't have a tree, you have a graph. – Jesús López Feb 04 '16 at 21:36
  • Without that primary key, a file can have more than one parent. This structure is a graph, it might be a DAG (Directed Acyclic Graph) or it might be a ciclic graph, i.e, circular references might happen. – Jesús López Feb 04 '16 at 21:41