-1

I'm creating a website which has posts and a commenting section. I'm very interested in "Lower bound of Wilson score confidence interval for a Bernoulli parameter" sorting algorithm which is explained by Evan Miller here.

And, my comment system has nested comments. How I store the comments in the postscomments table in MYSQL is as follows.

Id | ParentId | Comment             | PostId | ...
1    NULL       Parent Comment        1
2    1          Child comment         1
3    2          Grand Child comment   1

Inserting, updating, and deleting just work as a piece of cake.

Selecting is the worst thing. Let me tell you how I need it to be selected.

When I receive an AJAX request for my get-comments.php file, I need to get the comments for a specific post. (AJAX request sends the post ID). So, I need to select the comments of that post sorted according to that algorithm.

But, the problem is that I only need to select 30 parent comments, and 5 child comments per each parent. I'm not sure about a way to do the sorting and grouping at once in MYSQL query. Here's what I tried.

  • Recursive Query It didn't work for two reasons. I cannot group the results into parents and I cannot limit the resutls. (LIMIT is not supported inside WITH RECURSIVE)
  • Other hierarchical storage methods I tried some of them. But, they didn't work too.

So, finally I came up with SELECTING ALL and processing with PHP which succeeded 100%.

Here's the long PHP script that does the thing. It works very well. Returns grouped results, sorted correctly.

But, the problem I have is that, I have no comments limit per each post. So, If I get 100,000 or more comments per post, will my server get overloaded? There's not just 1 post, so I think this is not the best solution.

Do you have a better solution this?

Is there a way that I can do all of these things in a MYSQL query and reduce the number of rows that I select?

Thanks in advance.

Added:

Here's the structure of JSON Response that I need to get.

{
   "parent": [
        // sorted parent comments
        {
            "Id": 4,
            "UserId": 2,
            "PostId": 1,
            // and other comment data
        },
        {
            // another comment
        }
   ],
   "4": [
       // sorted array of children of comment 4
       {
           // comment 4's  child
       },
       {
           // comment 4's child
       }
   ],
   "6": [
       // sorted array of children of comment 6
   ]
}

Is there a way that I can first sort the comments on that algorithm, then select only the first 30 parent comments and only 5 child comments per each parent just using a MYSQL Query? (If I could do that, I can group that result to this JSON with PHP)

Supun Kavinda
  • 1,355
  • 14
  • 13
  • 1
    You've given us a lot of information, but I actually don't see a clear problem statement. What is the exact problem? – Tim Biegeleisen Jun 21 '19 at 05:20
  • My question is that, is it okay to select all rows and process. If not, is there a better way that I can select only the rows I need directly from MYSQL? – Supun Kavinda Jun 21 '19 at 05:25
  • 1
    The answer is that generally you do as much heavy lifting of data on MySQL versus PHP as is possible. That is, you should try to handle your requirement with a query on MySQL. If you post clear table input and the expected output, maybe someone can help you with that. – Tim Biegeleisen Jun 21 '19 at 05:26

1 Answers1

1

Everytime you are storing new vote, do:

  1. Store sum of upvotes and downvotes on each comment (you can do just "current+=1") - you will get rid of the "LEFT JOIN postscommentsvotes"
  2. Store the result of the crazy expression in ORDER BY into a column in the comments table as well. (lets call it Priority)

Separate the search query into two both using the precomputed Priority for sorting:

  1. searches for parents with WHERE PostId = ? ORDER BY Priority DESC LIMIT 30. Compount key (PostId, Priority) may be useful.
  2. query for all Id, ParentId WHERE PostId = ? ORDER BY ParentId, Id
  3. construct a tree from the result and exclude branches not belonging under comments from first query.
  4. query for children using ParentId IN (?) with parent ids being all ids of comments from given branch of the preconstructed id-tree (do this for each parent or compose a UNION)

Alternatively, query step 2 and process step 3 for each parent separately, but they may overlap, hard to say what will be better.

Btw, the first part may be optional, but since you split the search query in two both using the Priority, and potentialy both using it on the same rows, it also fastens up because you only compute it once for each upvote/downvote, instead of once or twice for each post detail request.

slepic
  • 641
  • 4
  • 11
  • Thanks, I tried approach earlier. I think keeping votes in a separate table is needed as those votes saves more details like user id, create time etc. And, left joining `postscommentsvotes` is not a problem for me. ORDER BY is also fine. And, I tried your suggestion earlier. But, a huge problem occurs. The comments are nested. So, child comments have child comments. (Any child comment can be a parent to another comment). So, this would not work. – Supun Kavinda Jun 21 '19 at 07:48
  • Yeah I've been thinking about it. You are right the first part is totaly optional. Sure you can recompute in the search query. But it will really optimize the speed, especially once the table is huge. The searches are much more common then the inserts (not saying the inserts are rare tho). I've also realized that my solution only searches immedate children, this wasnt clear from your question. – slepic Jun 21 '19 at 08:03
  • I have updated my answer to fit the request for nested children. Btw, ad postcommentsvotes - I dont say you remove the table, of course it stores additional data, but you may query it only for the current user's entries for all the comments loaded as a result of steps 1-5 of the search process.That way it separated from the vote-priority mechanism and can only take care of votes for comments that are ultimately going to be returned to the client. ping @SupunKavinda – slepic Jun 21 '19 at 08:24
  • @slipic, How can I do the third part? Constructing a tree? Also, is there a way that I can limit the results per each IN(?) – Supun Kavinda Jun 22 '19 at 01:10
  • 1
    @SupunKavinda Intentialy in step 2 i have ordered the result so that older comments come first, that way you can iterate the result and have parents always come before their children. then just put them in an array stucture like this: Actualy thinking about it it may not need to be a tree, this might suffice: [1 => [2,3], 2=>[5,8], 3=>[4,6,7], 4=>[], 5=>[], 6=>[], 7=>[]]] (keys being parent id, values being array of all direct children ids) – slepic Jun 22 '19 at 09:52
  • 1
    just as example now if you want all id of comments starting with parent id 1 - you go like this, put the 1 in the bucket, take ids of the children (2,3), and one after the other, put them in the bucket too, look for its children (2 has 5 and 8, 3 has 4,,6,7), if it has any, repeat the proess - put them in the bucket as well and look for their children to repeat the process again... until all children have been put to bucket and no more comments in the branch have any more children. – slepic Jun 22 '19 at 09:59
  • 1
    Combining into branhes could yield a result like this: [1=>[1,2,3,4,5,6,7,8], 2=>[2,5,8], 3=>[3,4,6,7], 4=>[4], 5=>[5], 6=>[6], 7=>[7], 8=>[8]] - here keys are ids of a comment and values array of all ids of comments in that branch.Pick only keys returned in step 1 as parentIds, and plug the values array as the IN (?) argument of the query in step 4 – slepic Jun 22 '19 at 10:09
  • 1
    Or, actualy they shouldnt contain themselves as a child, so rather something like this: [1=>[2,3,4,5,6,7,8], 2=>[5,8], 3=>[4,6,7], 4=>[], 5=>[], 6=>[], 7=>[], 8=>[]] Or do they, idk now, you will have to figure :) – slepic Jun 22 '19 at 10:15
  • 1
    Hope you're getting the general idea... Anyway let me say one more thing, this approach is investigating all the rows in a topic (more or less), just as yours did, difference is that this way you only investigate 2 columns of all the rows (and actualy probably only indexes will be used for that query), before you fetch the rest of the data only for rows you selected for the final result. Having two integers per row in memory shouldnt use more then a MB even for topics with thousands of comments... – slepic Jun 22 '19 at 10:26
  • slepic, thank you very much for your time. I got what you meant. It's more efficient in my point of view. I can implement this with the help of PHP, but I'm wondering if there's a way to do it in MYSQL directly. I'm not sure, but may be triggers? or something else? – Supun Kavinda Jun 22 '19 at 15:22
  • Triggers won't help you as there is nothing that would trigger them (only insert/update/delete invokes triggers, not selects).Stored procedures might be of help, but Im not sure about the ability of mysql to provide sufficient data types to hold this structure. Btw nothing wrong with moving part of the algorithm to PHP, but it is worth reducing the data travelling between webserver and db server to what is really needed. – slepic Jun 23 '19 at 00:07
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/195418/discussion-between-slepic-and-supun-kavinda). – slepic Jun 23 '19 at 00:13