How to implement hierarchical cms sites?

Question

I'm coding a little learning CMS project and I've hit a brick wall that's stopping me to complete the next step. I know I should be taking KISS (Keep It Simple, Stupid) into account, but I think it would nice, to be able to group pages hierarchicaly.

The problem is that I want page [root]->fruits->tropical->bananas to be accessible only from this url: http://localhost/cms/fruits/tropical/bananas/. What I came up with until now is that cms table has a parent field that points to its parent. The question is: How to parse uri adress and select a row from DB with as few queries/efficiently as possible?

Table structure:
Id
Slug
...
...
...
ParentId

All help and advice is kindly accepted.

http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/ — , Mar 07 '12 at 22:29
http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database — frail, Mar 14 '12 at 12:23

score 4 · Accepted Answer · answered Mar 16 '12 at 03:56

4

Here is the table structure that I used for testing this -

CREATE TABLE  `test`.`pages` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `slug` varchar(45) NOT NULL,
    `title` varchar(45) NOT NULL,
    `content` text NOT NULL,
    `parent_id` int(10) unsigned DEFAULT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `UQ_page_parent_id_slug` (`parent_id`,`slug`),
    CONSTRAINT `FK_page_parent_id` FOREIGN KEY (`parent_id`) REFERENCES `pages` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Note the unique key on (parent_id, slug). This is key to getting the best performance from the following query. I tested this with 50k rows and it still returned in less than 1ms for a five slug path - /cms/slug-1/slug-2/slug-3/slug-4/slug-5/

Here is the PHP code that I have come up with to build a suitable query -

<?php

// I will assume the rest of the url has already been stripped away
$url = '/fruits/tropical/bananas/';

// lets just make sure we don't have any leading or trailing /
$url = trim($url, '/');

// now let's split the remaining string based on the /
$aUrl = explode('/', $url);

/**
* Now let's build the query to retrieve this
*/

// this array stores the values to be bound to the query at the end
$aParams = array();

$field_list = 'SELECT p1.* ';
$tables = 'FROM pages p1 ';
$where = "WHERE p1.parent_id IS NULL AND p1.slug = ? ";

// this array stores the values to be bound to the query at the end
$aParams[] = $aUrl[0];

// if we have more than one element in our array we need to add to the query
$count = count($aUrl);

for ($i = 1; $i < $count; $i++) {

    // add another table to our query
    $table_alias = 'p' . ($i + 1);
    $prev_table_alias = 'p' . $i;
    $tables .= "INNER JOIN pages $table_alias ON {$prev_table_alias}.id = {$table_alias}.parent_id ";

    // add to where clause
    $where .= "AND {$table_alias}.slug = ? ";
    $aParams[] = $aUrl[$i];

    // overwrite the content of $field_list each time so we
    // only retrieve the data for the actual page requested
    $field_list = "SELECT {$table_alias}.* ";

}

$sql = $field_list . $tables . $where;

$result = $this->db->query($sql, $aParams);

answered Mar 16 '12 at 03:56

user1191247

10,808
2
22
32

This isn't as fast as my method. But can you explain a bit more what is an INNER JOIN? I usually use a LEFT JOIN to query adjacent list models recursively. – Micromega Mar 17 '12 at 07:57
This is actually significantly faster than your method as it will only access the pages contained within the path whereas your method will access every page in the table. In fact, your method will not do what the OP has asked for, unless I have completely misunderstood his question. [I suggest you read this page if you do not know what an INNER JOIN is](http://dev.mysql.com/doc/refman/5.0/en/join.html). – user1191247 Mar 17 '12 at 10:42
I didn't upvote you to get a pointer to a sensless site. Do you can describe a INNER JOIN by yourself or do you need an authoritive answer? I can easily Goggle myself? Also you use INNER JOIN, do you have some benchmark? – Micromega Mar 17 '12 at 11:05
I included in my answer a basic benchmark. With 50k pages in the hierarchy it took less than 1ms to retrieve a 5 slug path. INNER JOIN requires entries in both the left and right tables, whereas LEFT JOIN requires data in only the left table. There is nothing senseless about the MySQL Reference Manual. – user1191247 Mar 17 '12 at 11:25
Thank you. My method is showing also to paint the tree. Your method is just about accessing the adjacent list model. – Micromega Mar 17 '12 at 11:36
+1 for the recursive function to paint the tree but -1 for not answering the question. `The question is: How to parse uri adress and select a row from DB with as few queries/efficiently as possible?`. My method uses one query that will only access as many records as there are slugs in the url. There is not a more efficient way to do it with this adjacency list! – user1191247 Mar 17 '12 at 13:56
No. That's wrong. My method is superior. You are cheating and use for-loop to nest as many joins as needed. My method fetch the whole tree or part of it with where-clause and paint it with a single for-loop. – Micromega Mar 17 '12 at 14:02
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/8993/discussion-between-nnichols-and-david) – user1191247 Mar 17 '12 at 14:05

score 3 · Answer 2 · answered Mar 15 '12 at 12:41

3

If the page only exists on one url the easiest way to resolve the problem will be to store a hash of the full url in a indexed field of it's own:

SELECT * FROM table WHERE page = MD5('http://localhost/cms/fruits/tropical/bananas/')

Although if you are going to go down the hierarchical route you may find the following useful: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/

answered Mar 15 '12 at 12:41

Finbar Crago

432
5
12

Funny how those guys are arguing and your 3 sentence answer is the best of them. Hashing is fast and unique. Just update it if the page moves. All of these other answers are comical. – bkconrad Mar 18 '12 at 08:16
2

@bkconrad - I agree that this hashing solution is fast (not unique but possibly enough in this context) but it cannot return all the nodes in the path. You also have the issue of having to rehash every item in the branch when moving within hierarchy. You also have to store redundant data. – user1191247 Mar 18 '12 at 11:19

score 1 · Answer 3 · edited May 23 '17 at 12:28

1

I particularly like Bill Karwin's ( https://stackoverflow.com/users/20860/bill-karwin ) presentation on hierarchies:

http://en.slideshare.net/billkarwin/models-for-hierarchical-data

Recursively grab all data based on a parent id

You should take a look at his solution with a closure table. I found it particularly useful. It might be what you need.

edited May 23 '17 at 12:28

Community

1
1

answered Feb 27 '14 at 20:18

noderman

1,934
1
20
36

score 0 · Answer 4 · answered Mar 13 '12 at 10:28

You've tagged this question as CodeIgniter so this an answer specific to that.

You can use it's routeing capabilities to force the URL but handle the request in the required way.

What you would have is something like:

$route['cms/fruit/(:any)'] = 'fruit/$1';
$route['cms/fruit/(:any)/(:any)'] = 'fruit/$1/$2';

The first line would forward all URLs that start with cms/fruit to the fruit controller and pass across the fruit type as the first variable (maybe the fruit name as the second variable too). The second line is a fall back in case it didn't take care of the fruit name.

Combine this with the base path in the config and you could automatically set 'cms' in the URL too, if it's something that should always be in the URL.

score 0 · Answer 5 · answered Mar 14 '12 at 04:56

Are you using codeigniter for this development? This answer below is based on the web application framework from codeigniter. So here goes,

The problem is that I want page [root]->fruits->tropical->bananas to be accessible only from this url: http://localhost/cms/fruits/tropical/bananas/.

So create a function in your controller with function name fruits and with two parameters? For example

class Cms extends CI_Controller {
  ...
  ...
  ...
  public function __construct() {
     $this->load->model('cms_model');
  }

  public function fruits($tropical, $bananas) {

     $string = $this->cms_model->getPage($tropical, $bananas);

     // load the view you want.
     $this->load->view('');
  }
  ...
  ...
  ...

}

What I came up with until now is that cms table has a parent field that points to its parent. The question is: How to parse uri adress and select a row from DB with as few queries/efficiently as possible?

Table cms:
Id
Slug
ParentId

Table cms_parent:
Id

Let's describe with two example table shown above, cms table and cms parent table. You did not specify exactly in your question on what is your query want or the query result return. So below is my guess based on your question description that is, query two table by joining them using the common key and then apply the condition where.

// select * from cms t1 join cms_parent t2 on t1.ParentId = t2.Id where t1.Id = '' and t2.ParentId = 'level1';
public function getPage($level0, $level1) {
    $this->db->select('*');
    $this->db->from('cms');
    $this->db->join('cms_parent', 'cms.ParentId = cms_parent.Id');
    $this->db->where('cms.Id', $level0);
    $this->db->where('cms.ParentId', $level1);

    $query = $this->db->get();

    // return one row from database.
    return $query->row();
}

You don't get it completely... Every page stored in the DB can also be a parent to n more pages and has n levels of parents — JanLikar, Mar 14 '12 at 20:17
When you say I don't get it completely, **which** part are you referring to? Can you be specific? From what I understand from your comment, you think I do not understand is the part on how to model the page(N levels of page) in database and hence retrieve the page from database? — Jasonw, Mar 15 '12 at 01:10

How to implement hierarchical cms sites?

5 Answers5

Linked