3

Here is my schema using sqlite, I am not sure if this is a good way to create tree structure in sql as I have to traverse many time to obtain the entire tree, as oppose to extracting the entire tree base on top comment and construct the tree in python. Can someone give me some suggestions.

BEGIN;
CREATE TABLE "tree_comment" 
    ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, 
     "text" text NOT NULL, 
     "parent_id" integer NULL REFERENCES "tree_comment" ("id"));
CREATE INDEX "tree_comment_6be37982" ON "tree_comment" ("parent_id");

COMMIT;
Kevin
  • 2,187
  • 4
  • 26
  • 35
  • You makes flat tables and relationships among the nodes (rows) should be mentioned using ForeignKeys relationships. To dispatch in a tree like representation use [Join](https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=7&cad=rja&uact=8&ved=0CDgQFjAG&url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJoin_%2528SQL%2529&ei=hIswVaAal97wBeTRgIgM&usg=AFQjCNEH9tgidVxiZKXCDR6cq0IgH44DQg&bvm=bv.91071109,d.dGc) or something. Check this *MySQL* question and answer [mysql-tree-hierarchical-query](http://dba.stackexchange.com/questions/30021/mysql-tree-hierarchical-query) – Grijesh Chauhan Apr 17 '15 at 04:25
  • If you have tons of comments, you might not want to get the entire tree. For example reddit has lots of comments and gets only the first 200 ish or those above a certain threshold by default, and gets more by ajax. – Neil McGuigan Apr 17 '15 at 04:47
  • Which Python version? If the SQLite version shipped with it is new enough, it supports [recursive CTEs](http://www.sqlite.org/lang_with.html). – CL. Apr 17 '15 at 08:57

3 Answers3

3

Your example is the correct way to represent hierarchical data in a relational database. You should use Recursive Common Table Expressions (R CTE) to query the table.

In the past, you would have had to have used Nested Set or Materialized Path, but R CTE was purpose-built to fix their flaws, so use it.

SQLite supports Recursive CTE, and it's part of the SQL:1999 standard.

Here is an example query:

https://database-patterns.blogspot.com/2014/02/trees-paths-recursive-cte-postgresql.html

Neil McGuigan
  • 46,580
  • 12
  • 123
  • 152
1

Since the data structure doesn't have a natural representation in relational databases, there are several ways to store them depending on the use case and the database.

In this presentation, from slide 48, there are two methods described.

My favourite one is materialized path which is very simple and capable.

Hope it helps.

Javier
  • 2,752
  • 15
  • 30
  • 1
    why would you use materialized path when sqlite supports recursive cte? – Neil McGuigan Apr 18 '15 at 00:24
  • I think OP updated the question with more information since I posted (I don't remember him mentioning SQLite). In general, it depends on the way the nodes will be queried and the performance of the implementation of the DB. If you just want to retrieve the entire thing, materialized path requires no processing other than sorting on strings. – Javier Apr 18 '15 at 04:29
  • 1
    "the data structure doesn't have a natural representation in relational databases" is violently false. Kevin has implemented it in his table. The RM was *founded* on the Hierarchical model, it has all of those capabilities (plus much, much, more). Hilarious. – PerformanceDBA Apr 18 '15 at 11:01
  • Maybe I should have said native? You think relational algebra is related to graph theory? I never saw the relation. :) – Javier Apr 18 '15 at 15:06
  • (1.a) Hierachies are native in the RM. That was one of Codd's explcit goals, to overcome a specific limitation in the HM. (1.b) I repeat, Kevin has given it. If you can't see it, ask a question. (1.c) Read Jay's answer. (2) Don't put your words in my mouth, and tell me what I think. It tastes like crap and it is offensive. I didn't say that they were equal or related. I said what I said, it is on record. You can infer or imply or cross-breed or mangle whatever you like from it, but all that is yours, and you own it. Don't attribute it to me, it is dishonest. Sheesh. – PerformanceDBA Apr 20 '15 at 19:28
  • Wow, that's intense. :) I don't know how to understand "The RM was founded on the Hierarchical model". It's founded but not related? I have two questions in my previous post and a statement about my personal view. Nevertheless, if you have a reference of Codd's goals when creating SQL, I'd find that educational. – Javier Apr 20 '15 at 20:57
  • @Javier. *Intense*: which, the act of putting words in my mouth or the act of spitting them out ? Who said it was not related ? The RM is founded on, and therefore a progression of, the HM. Codd did not write his RM in isolation, sitting on one of Jupiter's dark moons. No. All good technology is an extension of previous **sound** technology. He worked for IBM and furthered IBM's technology, he met then and **and exceeded** them spectacularly. The detractors suppress hierarchies, which is why people don't know the basics, and (a) you wrote what you wrote (b) OP has the problems. – PerformanceDBA Apr 23 '15 at 09:41
  • @Javier. *Refs* buy a book. Or google and work through the garbage posted by the uneducated. *Two question* Sorry, I thought the answers were obvious, and I don't have to defend words that are not mine, they are yours. *Maybe I should have said [hierarchies do not have a] native representation in RDbs ?* That would be **violently wrong**, because they do. *Is relational algebra related to graph theory* No, they are entirely separate areas of mathematics. *I never saw the relation* ok. – PerformanceDBA Apr 23 '15 at 09:51
  • 1
    @Javier. Karvin has swallowed the suppression pill, his description of "adjacency list", in fact, describes Hierarchies. But he fails to understand that, to produce a "path" (concatenated string of either ancestors *or* descendants, or a subtree report, all one needs is a recursive function, which is dead easy to write. I suppose if he could, he would.He then gives three massively inefficient and stupid solutions, which form duplication; break 2NF; and require additional code. It is not SQL, it is some low-end NonSql. – PerformanceDBA Apr 23 '15 at 10:26
  • 1
    @Javier. Karvin then gives three **massively inefficient and stupid solutions**, which form duplication; break 2NF; have no integrity; and require additional code. It is not for SQL, it is for some low-end NonSql. For his slide 77, I have just one line, with unsuppressed **Hierachy** instead of the "adjacency list" of the suppressors, and "Query Subtree" = Dead Easy. Materialsed Path is a fancy name for those who like duplicated data; no integrity; and additional code that keeps breaking. I call it **Ignorance**, but when it is supported and elevated to "technology", I call it **Stupidity** – PerformanceDBA Apr 23 '15 at 10:31
  • 1
    @PerformanceDBA, your points on duplication, the loss of referential integrity and normalization are solid. – Javier Apr 24 '15 at 00:52
1

Yes, the conventional way to represent a tree in a relational database is just as you did in your example: let each node in the tree have an ID, and a parent_ID. Create indexes on both for rapid retrieval.

The problem with trees in RDBs is not creating them, it's retrieving them efficiently. There's no query in standard SQL that will retrieve an entire tree or sub-tree. You have to write a loop.

Jay
  • 26,876
  • 10
  • 61
  • 112
  • No standard? http://en.wikipedia.org/wiki/SQL:1999#Common_table_expressions_and_recursive_queries – Neil McGuigan Apr 18 '15 at 00:27
  • @NeilMcGuigan is correct but almost all implementations of trees allow to retrieve subsets of the tree. Look at the first link in my post. – Javier Apr 18 '15 at 04:33
  • @Jay. Your answer is correct except for one point. You don't need a loop (or CTEs, or similar). You need recursion in the server, which the SQLs have, but the Non-SQLs don't have. All you need is a very small proc, which is written to be recursive. – PerformanceDBA Apr 18 '15 at 11:06
  • @PerformanceDBA Okay, yes, not a loop, but a function that calls itself recursively. The question doesn't say what SQL engine this person is using, and capabilities of sprocs aren't really standard, but I suppose it's true that at least most SQL engines have sproc features that would let you write such a function on the server side. But in any case, if you can't, you could always write the code in the calling program. Also, Oracle has a feature to chase trees with a single SELECT. It's been so long since I've used it I forget the command. – Jay Apr 20 '15 at 13:07
  • @Jay. Yes. (1) The issue is not whether the platform has sprocs, it is whether the platform supports recursion. The low end, the pretend-SQLs, don't. Oracle (which is low end) can't. It cacks itself on subqueries. Therefore they have to provide a CTE feature, to move it the direction of high end recursion. (2) Functions for scalars, sprocs for rows. – PerformanceDBA Apr 20 '15 at 15:40
  • @PerformanceDBA Oracle has a simple syntax to handle recursive queries: START AT / CONNECT BY. Until today I was not aware that you could use CTEs to execute a recursive query. Cool. RE (2) I was using the word "function" in the generic sense of a block of code that performs some job. But if you want to be pedantic, MS SQL supports "table-valued functions" as well as "scalar-valued functions". – Jay Apr 20 '15 at 19:14
  • @Jay. (1.a) since you have agreed that "loop" is incorrect, change your answer to suit, and attract a vote or a choice. (1.b) The one and only purpose of CTE (and the Oracle equivalent) is to perform a recursive query in a single statement (ie. avoid having to write a proc). (2) Ok, so MS now has "table-valued functions". No idea what it is for, because since 1984, SQL queries produce table valued results. It is called a"subquery". Perhaps it is for newbies who can't write subqueries. Perhaps it is the produce-result-set-as-a-table feature that we have had since 1997. – PerformanceDBA Apr 20 '15 at 19:39
  • RE CTEs "one and only purpose": Not disagreeing. My intent was to say, "I didn't realize MS SQL had a facility to perform recursive queries", not, "I didn't realize that among the many uses of this feature was the ability to perform recursive queries." – Jay Apr 20 '15 at 19:49
  • @Jay. Kevin has already done all that you ask for in your 1st para. If you enhance your answer to include the content of these comments (clarifying the methods), I will vote for it. Producing either ancestors/descendants in a "path" or a subtree rpt, is dead easy with recursion. – PerformanceDBA Apr 23 '15 at 10:39