0

I am querying the Stack Exchange Database at https://data.stackexchange.com/stackoverflow/query/new

to count the total number of posts that have the same AnswerCountvalue as another post.

This is my attempt which causes an error Something unexpected went wrong while running your query. Don't worry, blame is already being assigned.

The idea is to increase the counter after a post/record in the Posts table has a single match for its AnswerCount value with another post/record, while not counting the match with its own AnswerCount value.

Select Count(*)
From Posts as p1
Join Posts as p2
On p2.Id = {
            Select Top 1 Id
            From p2
            Where p1.AnswerCount = p2.AnswerCount
            And p1.Id <> p2.Id
            };

This is Stack Exchange post I am using as a reference: How to Join to first row

Salazar
  • 3
  • 1
  • Your current query is almost a Cross join, But isn't this exactly the same as the count of rows with that AnswerCount minus 1? – dnoeth Aug 29 '20 at 10:04

2 Answers2

1

Your logic is on the right track, but I would use exists here:

SELECT COUNT(*)
FROM Posts p1
WHERE EXISTS (SELECT 1 FROM Posts p2
              WHERE p2.Id <> p1.Id AND p2.AnswerCount = p1.AnswerCount);

Read in plain English, the above query says to count every post record for which we can find a different post record with the same AnswerCount value by a different Id value (implying it is a different record).

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • How to return only a single-record (I can't use `Top 1` in a subquery), instead of the multiple records returned by the `Select 1...` – Salazar Aug 29 '20 at 07:17
  • `SELECT 1` does not return "multiple" records, and in fact `EXISTS` will stop scanning the table as soon as it finds a _single_ matching record. The edge case you are worrying over is not really there. – Tim Biegeleisen Aug 29 '20 at 07:23
1

count the total number of posts that have the same AnswerCountvalue as another post.

Your logic is way to complex. The number of other rows with the same answer count as the current row is exactly the same as the number of rows with this answer count minus 1. Only if the count equals 1 there's no other answer, thus:

select sum(cnt)
from
 ( -- count per AnswerCount
   Select AnswerCount, Count(*) as cnt
   From Posts
   group by AnswerCount
   having count(*) > 1 -- remove unique AnswerCount
 ) as dt

Or to add more details:

select sum(cnt) as answers
  ,sum(case when cnt > 1 then cnt end) as same_AnswerCount_as others
  ,sum(case when cnt = 1 then cnt end) as Unique_AnswerCount
  ,max(AnswerCount)
from
 ( 
   Select AnswerCount, Count(*) as cnt
   From Posts
   group by AnswerCount
 ) as dt

Btw, Data Explorer currently fails with this error message for even the simplest queries.

dnoeth
  • 59,503
  • 4
  • 39
  • 56