2

I would like to model a hierarchy/directory like what you see below, in a mysql table. You can see below the table schema I was thinking. However, the directory Im talking about would be comprised of 100.000 elements and the depth would be ~5-10 levels. Moreover, we will have a tags pool and each element of the directory may be linked to one or more tags. So I was wondering if there is better approach. I was reading that some people decide to design tables that are not canonical for the shake of high performance and I am evaluating this case too.

ps: some people use Multi-way Trees to model this in the programming language level so the question how this ends up in the database remains.

hierarchy:
A
| -> 1
     |->1
     |->2
| -> 2
| -> 3
B
| -> 1
| -> 2

table:
 ___________________________
| id      |element | father |
|---------------------------|
|  000    |   A    |  null  |
|  001    |   1    |  000   |
|  002    |   1    |  001   |
|  003    |   2    |  001   |
|  004    |   2    |  000   |
|  005    |   3    |  000   |
|  006    |   B    |  null  |
|  001    |   1    |  006   |
|  002    |   2    |  006   |
-----------------------------
nonouco
  • 1,170
  • 1
  • 13
  • 25
  • What's the question? You define a reasonable schema for your problem. Do you need help with constraints? With storing attributes/tags? – Patrick87 Jul 28 '11 at 13:11
  • My question is if there is a better than the approach I am presenting. Im worried about the linking of each element with a large number of tags. joins cost. – nonouco Jul 28 '11 at 14:02
  • The best solution is the one which meets your requirements (including performance requirements); Have you tried what you've designed here? Where does it fail? – Stephanie Page Jul 28 '11 at 19:32
  • I have tried but it is unlikely that I can simulate the prod env. conditions. I can only test on my local pc which is fast. Since the modeling of the hierarchy seems fine, I believe that I will have performance issues when I join this table with the tags table. Im looking for a non canonical approach as an alternative. – nonouco Jul 29 '11 at 09:45
  • Well if you can't simulate a production-like environment you'll be shooting in the dark as to the outcome of the actual design. You'll only get generic best-practice stuff here. It's purely a YMMV help system... always left up to the help-ee to take the advice and test it with actual data. – Stephanie Page Jul 29 '11 at 22:19
  • Its impossible to have always the luxury of production simulation. If I create a dummy dataset of 100.000 records - which will be like my production and hit it with a tool to simulate 10.000 users, I will get close but I will miss having the same server configuration as my client and the same network conditions. I think you can get close but not at a level where you can trust your findings. – nonouco Aug 04 '11 at 08:01

1 Answers1

5

A very fast hierachical tree is a nested set or a Celko-tree, it's a bit like a binary tree, or a huffman tree when you have a MySQL storage engine. Disadvantage is expensive deletion and insertion. Other RDBMS supports also recursive queries. In general I didn't see many nested sets. It seems to be complicated too create and maintain. When the nested set is too complicated and the RDBMS doesn't support recursive queries there is also materialized path.

  1. http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html
  2. http://en.wikipedia.org/wiki/Binary_tree
  3. http://en.wikipedia.org/wiki/Huffman_coding
  4. http://www.postgresql.org/docs/8.4/static/queries-with.html
  5. Is it possible to make a recursive SQL query?
  6. http://www.cybertec.at/pgbook/node122.html
Community
  • 1
  • 1
Micromega
  • 12,486
  • 7
  • 35
  • 72