0

I'm writing a system similar to a blog, where a post can have any number of comments. Counting the number of top-level comments on a post is trivial, but the problem is that each comment can have child comments. I'm not sure how to get the total number of both the top-level comments and all child comments under them. How can I go about this?

For simplicity, all posts have only an ID and all comments will have exactly one of either a post ID or a parent comment ID.

Post:

Id

Comment:

Id
PostId
ParentCommentId

Example comment data. The rows with a PostId are comments directly on the post itself: top-level comments. The ones with a ParentCommentId are replies to other comments.

Id | PostId | ParentCommentId
---|--------|----------------
1  |   1    | NULL
2  |  NULL  | 1
3  |  NULL  | 2
4  |  NULL  | 2
5  |   1    | NULL
6  |   2    | NULL
7  |   2    | NULL
8  |  NULL  | 6
9  |  NULL  | 6

This would be the layout of the above posts/comments:

POST 1:
--comment 1
  --comment 2
    --comment 3
    --comment 4
--comment 5

POST 2:
--comment 6
  --comment 8
  --comment 9
--comment 7

Querying post ID 1 should return 5, and post ID 2 should return 4.

vaindil
  • 7,536
  • 21
  • 68
  • 127
  • So `PostId` is the "displayed" number of the post, but `ParentCommentId` refers back to `Id`, _not_ `PostId`? The `PostId` 1 is related to `Id`s 1 and 5 directly, 2 as a parent and 3 and 4 as a grandparent for a total of 5, right? Similarly `PostId` 2 is related directly to `Id`s 6 and 7 and parent to 8 and 9 for a total of 4. A better explanation of your example would have been, um, better. – HABO Jan 15 '17 at 03:29
  • A recursive CTE ought to handle it, at least to a depth of about 32K. What have you tried? – HABO Jan 15 '17 at 04:08

2 Answers2

1

Perchance this?

-- Sample data.
declare @Samples as Table ( Id Int Identity, PostId Int, ParentCommentId Int );
insert into @Samples ( PostId, ParentCommentId ) values
  ( 1, NULL ), ( NULL, 1 ), ( NULL, 2 ), ( NULL, 2 ),
  ( 1, NULL ), ( 2, NULL ), ( 2, NULL ),
  ( NULL, 6 ), ( NULL, 6 );
select * from @Samples;

-- The query.
with Tree as (
  -- Top level comments.
  select Id, PostId
    from @Samples
    where PostId is not NULL -- You can specify a PostId here, e.g. "where PostId = 1".
  union all
  -- Children, one generation at a time.
  select S.Id, T.PostId
    from Tree as T inner join
      @Samples as S on S.ParentCommentId = T.Id
  )
  select PostId, Count( 42 ) as Comments
    from Tree
    group by PostId;

To see the intermediate results replace the final select with select * from Tree;.

HABO
  • 15,314
  • 5
  • 39
  • 57
  • This looks awesome! I can't figure out what the `Count( 42 )` in the final `select` is doing though. – vaindil Jan 17 '17 at 21:35
  • You wanted the `Count` of comments `group by PostId`. [`Count`](https://msdn.microsoft.com/en-us/library/ms175997.aspx) takes an argument and counts all of the rows for which the argument is not `NULL`. This is one of the places where an asterisk (`*`) works, but some people have **"never use * in a query"** beaten into them. A constant expression also works and is unlikely to be `NULL`. You can use `1` since counting ones make sense or `0` just for the amusement value. I use [`42`](https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker's_Guide_to_the_Galaxy) due to its special meaning. – HABO Jan 17 '17 at 23:13
  • Ah, okay, I didn't think of `Count` being used that way, the constant threw me off. I assumed you picked 42 for that reason though. ;) Thank you so much! – vaindil Jan 18 '17 at 01:11
0

I think this question and answer will get to your problem. Simplest way to do a recursive self-join in SQL Server? I do have a question: In your data set, you have two records with PostID =1 and no commentID. Shouldn't they be different?

Community
  • 1
  • 1
Adam Jacobson
  • 564
  • 4
  • 9