2

I need to properly order a string column that contains a delimited group of numbers, such as:

1
1.1
1.1.1
1.2
1.2.1
1.10
2
2.1

These numbers are used to define a tree: for instance 1 and 2 are top level nodes, and 1.1, 1.2, and 1.10 are immediate children of 1. The tree can be arbitrarily deep so there is no upper bound per se on the number of periods that are in each entry (although in practice a character limit on the column will enforce this).

The problem I am having is that a standard ORDER BY operation in SQL will list 1.10 ahead of 1.2. This is of course expected, but unfortunately not what I want, since 10 > 2. Is there an efficient way to get my ordering? I am using MySQL.

Note that I am not necessarily wedded to using this encoding for a tree, so if there is a different encoding that is easier to order than feel free to suggest it. However, one nice thing that this structure offers me is an easy way to recover either all the upstream parent nodes or all the downstream child nodes in one pass, which is not something that would work with a more typical row/parent_row model (as far as I know). For instance, given id 1.2.1.4.5 I know that the four ancestors are 1, 1.2, 1.2.1, and 1.2.1.4, and all children (both direct descendants and children of children) will have an id that starts with 1.2.1.4.5..

Abiel
  • 5,251
  • 9
  • 54
  • 74
  • how deep is the tree? how many top-level nodes? also no upper bound? – Andronicus Dec 12 '11 at 01:35
  • See this question for a "better" structure: http://stackoverflow.com/questions/5916482/php-mysql-best-tree-structure – Marc B Dec 12 '11 at 01:36
  • @Riyono: in practice the tree probably won't get deeper than 10 levels. – Abiel Dec 12 '11 at 02:19
  • @Marc: Thanks, I may use that structure. The only thing I don't like about it is that it looks like it tends to generate a large number of changes when a single node changes, by causing many left and right positions of other nodes to change. For certain reasons I may be forced to store revisions to the tree, so in this respect fewer node changes are better. – Abiel Dec 12 '11 at 02:55
  • and how many child nodes (again, in practice) will a parent node have? (wondering if we could take a different approach to keep the tree) – Andronicus Dec 12 '11 at 05:06
  • @Riyono: Each node probably doesn't have more than 50 direct children, though that is just a rule of thumb not a enforced constraint; when you consider children of children and so on then the number could be potentially millions. – Abiel Dec 12 '11 at 06:16
  • Could you name them a-z instead of 1-n? It would allow you to implement this cheaply up to 26 levels (using just one char for the level means normal order by will work). If you need more, consider using 0-9a-zA-Z as your chars - allowing 62 levels – Bohemian Dec 12 '11 at 07:21

1 Answers1

0

@Abiel, Based on your comment that each node probably doesn't have more than 50 direct children, and in that case (where 62 child nodes will be enough)—as @Bohemian stated on the comment—we could use computer-readable code 0-9A-Za-z (Base 62 Encoding), we could set a new VARCHAR( 255 ) BINARY column to help us with sorting.

CREATE TABLE  `test`.`tree` (
  `computer_readable` VARCHAR( 255 ) BINARY NOT NULL ,
  `human_readable` VARCHAR( 255 ) NOT NULL ,
  `title` VARCHAR( 255 ) NOT NULL ,
  PRIMARY KEY (  `computer_readable` ) ,
  INDEX (  `human_readable` )
);

As you might already noticed, the computer_readable column is defined as BINARY, so we could have up to 62 [0-9A-Za-z] (instead of 36 [0-9a-z]) child nodes for each node.

mysql> SELECT 'a' = 'A';
+-----------+
| 'a' = 'A' |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql> SELECT BINARY 'a' = 'A';
+------------------+
| BINARY 'a' = 'A' |
+------------------+
|                0 |
+------------------+
1 row in set (0.00 sec)

mysql> SELECT BINARY 'a' > 'A';
+------------------+
| BINARY 'a' > 'A' |
+------------------+
|                1 |
+------------------+
1 row in set (0.00 sec)

So, with this new field, we could have the data in this format (stated in CSV below):

"1";"1";"Books"
"11";"1.1";"Fiction"
"111";"1.1.1";"Science-Fiction"
"12";"1.2";"Self-Help"
"121";"1.2.1";"Motivational"
"1A";"1.10";"Textbooks"
"1e";"1.40";"Coloring Books"
"2";"2";"Music"
"21";"2.1";"Classical"

Which MySQL will happily sort for us

mysql> SELECT * FROM `tree` ORDER BY `computer_readable`;
+-------------------+----------------+-----------------+
| computer_readable | human_readable | title           |
+-------------------+----------------+-----------------+
| 1                 | 1              | Books           |
| 11                | 1.1            | Fiction         |
| 111               | 1.1.1          | Science-Fiction |
| 12                | 1.2            | Self-Help       |
| 121               | 1.2.1          | Motivational    |
| 1A                | 1.10           | Textbooks       |
| 1e                | 1.40           | Coloring Books  |
| 2                 | 2              | Music           |
| 21                | 2.1            | Classical       |
+-------------------+----------------+-----------------+
9 rows in set (0.00 sec)

Hope this helps.

Andronicus
  • 1,389
  • 11
  • 13