5

Im making an app where multiple users can post comments above or below other comments. This is not a thread-type structure. It's more like collaborating on a Word document. Im having trouble designing the method these entries are sorted.

Using mySQL and PHP, sorting by time of entry doesnt work, and neither does sorting by comment position because the position changes if user posts inbetween other comments. I dont want to have to re-serialize comment positions for every new entry (what if there are thousands of entries and dozens of users doing the same thing).

What is the best way to design this?

Gustav Bertram
  • 14,591
  • 3
  • 40
  • 65
Justin
  • 305
  • 1
  • 7

3 Answers3

1

What you are describing is a linked list. The problem is that they are usually hard to retrieve using just SQL. My solution is to use PHP to do the sorting upon retrieval.

Your table would look something like this:

CREATE TABLE page {
   page_id INT,
   first_comment_id INT
}

CREATE TABLE comment {
   comment_id INT PRIMARY KEY AUTOINCREMENT,
   page_id INT,
   next_comment_id INT
}

Your query is simple:

SELECT comment_id, next_comment_id 
FROM comment 
WHERE page_id = $page_id 
ORDER BY comment_id DESC

The important step is to massage the results from mysql_fetch_assoc() into an array that is indexed according to comment_id:

$result = mysql_query($sql);
$indexed_list = array();
while ($row = mysql_fetch_assoc($result)) 
{
    $indexed_list[$row['comment_id']] = $row;
}

Resulting in an array similar to this one:

$indexed_list = array(
    1 => array("comment_id"=>1, "next_comment_id"=>2),
    2 => array("comment_id"=>2, "next_comment_id"=>5),
    3 => array("comment_id"=>3, "next_comment_id"=>4),
    4 => array("comment_id"=>4, "next_comment_id"=>0),
    5 => array("comment_id"=>5, "next_comment_id"=>3));

The PHP function to sort them into displayable order is simple:

function llsort($indexed_list, $first_comment_id) 
{
    $sorted_list = array();

    $node = $indexed_list[$first_comment_id];
    array_push($sorted_list, $node);

    do
    {
        $node = $indexed_list[$node['next_comment_id']];
        array_push($sorted_list, $node);
    } while ($node['next_comment_id'] != 0 
        AND isset($indexed_list[$node['next_comment_id']]) );

    return $sorted_list;
}

You get first_comment_id from the page table. Of course, you still have to implement functions to insert a node and delete a node, but those are left as exercises for the reader. Don't forget to use transactions for inserting and deleting nodes.

More information on linked lists in MySQL:

Community
  • 1
  • 1
Gustav Bertram
  • 14,591
  • 3
  • 40
  • 65
0

this sounds like a good time to use MPTT, Modified Pre-ordered Tree Traversal. It's often used for threaded comment boards and things of that nature. Of all the ways to keep hierarchical structures in a RDBMS, it has the lowest overhead when pruning or adding nodes to the tree.

here is a good intro, and another. Googling around for it should get you some more info. It's not hard to implement at all once you understand the concept.

sbeam
  • 4,622
  • 6
  • 33
  • 43
-1

I would definitely go with ordering by position. When inserting, it's just a question of incrementing all entries below it -- a single update query. One important feature of that implementation is that it handles concurrency very well; if there are two concurrent inserts, you don't care in which order the incrementing is done (but you do need a non-positional pk so there's no upset when an insert happens above you).

An alternative is to model it as a tree, which means you only need to update entries below you in the branch. But it's going to be a rare situation where the maintenance overhead is justifiable. (A compromise is to model is as a weeping-willow -- you divide the total into chunks which form branches, but you don't allow branches from branches; that avoids ever having to update every single record; however I'm still guessing it's not worth the overhead in comparison to the first approach.)

niczero
  • 367
  • 1
  • 7
  • Note that Gustav's approach has the worst performance of the three suggestions. You want to be able to render quickly a possibly large resultset into HTML, so be sure to consider performance. – niczero Nov 10 '11 at 09:50