9

Continuing with the spirit of using the Stack Exchange Data Explorer to learn SQL, (see: Can we become our own “Northwind” for teaching SQL / databases?), I've decided to try to write a query to answer a simple question (on meta): What % of stackoverflow users have over 10,000 rep?.

Here's what I've done:

Query#1

SELECT COUNT(*)
FROM Users
WHERE
  Users.Reputation >= 10000

Result:

556

Query#2

SELECT COUNT(*)
FROM
  USERS

Result:

227691

Now, how do I put them together into one query? What is this query idiom called? What do I need to write so I can get, say, a one-row three-column result like this:

556     227691      0,00244190592
Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • OK, just found out that there are lots of dupes that I should probably read first. I wasn't quick enough to delete before Mark Byers answered, so I undeleted it for now. Feel free to close for dupes if community feels like it. – polygenelubricants May 29 '10 at 07:31
  • I've fixed my answer. I'd be interested to see if there is a better way to do it. Also, can you link to the dupes, then we can decide if this should be closed or merged. – Mark Byers May 29 '10 at 07:35

6 Answers6

11

You can use a Common Table Expression (CTE):

WITH c1 AS (
    SELECT COUNT(*) AS cnt
    FROM Users
    WHERE Users.Reputation >= 10000
), c2 AS (
    SELECT COUNT(*) AS cnt
    FROM Users
)
SELECT c1.cnt, c2.cnt, CAST(c1.cnt AS FLOAT) / c2.cnt
FROM c1, c2
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Can you make this work on SEDE? Link to the query with result etc? I still get `incorrect syntax near c2` with this. – polygenelubricants May 29 '10 at 07:37
  • 1
    @polygenelubricants: Interesting that it doesn't work. Do you know what the database is? I tested the query in SQL Server 2008. – Mark Byers May 29 '10 at 07:40
  • 1
    I'm still puzzled why multiple CTEs aren't working in the SEDE. – Daniel Vassallo May 29 '10 at 07:53
  • 1
    @Daniel: I hope this doesn't mean that SEDE is too "broken" for learning =( – polygenelubricants May 29 '10 at 07:56
  • Oops, I just messed up copying it. JOIN without ON was the problem. The CTEs work fine now that I've fixed the error. – Mark Byers May 29 '10 at 08:03
  • 2
    Works fine now :) ... @polygenelubricants: Now you have three working versions to play with! ... This is the most elegant query IMO, but note that the `WITH` clause is not directly supported in some databases such as MySQL (http://stackoverflow.com/questions/1382573/) – Daniel Vassallo May 29 '10 at 08:04
3

Apart from using CTEs, in this case you could also have done:

SELECT CAST((SELECT COUNT(*) FROM Users WHERE Users.Reputation >= 10000) AS float)  /
       (SELECT COUNT(*) FROM USERS) * 100  AS Percentage​

The cast as float was to force a floating-point division, because with integer division 556 / 227691 would give 0.

Community
  • 1
  • 1
Daniel Vassallo
  • 337,827
  • 72
  • 505
  • 443
3

Thanks to the other answers here, I've written the following queries, all of which work on SEDE:

"Inline view"

SELECT *, CAST([10K] AS FLOAT)/[All] AS [Ratio]
FROM (
   SELECT
    (SELECT COUNT(*) FROM Users) AS [All],
    (SELECT COUNT(*) FROM Users Where Reputation >= 10000) AS [10K]
) AS UsersCount

(See query result)


Variables

DECLARE @numAll FLOAT
DECLARE @num10kers FLOAT

SET @numAll = (SELECT COUNT(*) FROM Users)
SET @num10kers = (SELECT COUNT(*) FROM Users WHERE Users.Reputation >= 10000);

SELECT  @num10kers AS [10K], @numAll AS [All], @num10Kers/@numAll AS [Ratio]

(See query result)

References


Common Table Expression

WITH Users10K AS ( 
    SELECT COUNT(*) AS Count
    FROM Users
    WHERE Users.Reputation >= 10000
), UsersAll AS (
    SELECT COUNT(*) As Count
    FROM Users
)
SELECT
    Users10K.Count AS [10K],
    UsersAll.Count AS [All],
    CAST(Users10K.Count AS FLOAT) / UsersAll.Count AS [Ratio]
FROM Users10K, UsersAll

(See query result)

References

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
3

For queries like this, where I'm doing multiple counts on a single table based on different criteria, I like to use SUM and CASE:

SELECT
    UsersCount.[10K],
    UsersCount.[All],
    (CAST(UsersCount.[10K] AS FLOAT) / UsersCount.[All]) AS [Ratio]
FROM
    (SELECT
         SUM(CASE
               WHEN Users.Reputation >= 10000 THEN 1
               ELSE 0
             END) AS [10K],
         COUNT(*) AS [All]
     FROM Users) AS UsersCount

(query results)

The advantage is that you're only scanning the Users table once, which may be significantly faster.

Cheran Shunmugavel
  • 8,319
  • 1
  • 33
  • 40
2
WITH tmp as (
SELECT COUNT(ID) AS repCount, (SELECT COUNT(ID) FROM Users ) AS totalCount
FROM Users
WHERE Users.Reputation > 10000
)
SELECT tmp.repCount, tmp.totalCount, (cast(tmp.repCount as decimal(10,2))/tmp.TotalCount) * 100 AS Percentage
FROM tmp

UPDATED: without the with

SELECT COUNT(ID) AS repCount, (SELECT COUNT(ID) FROM Users ) AS totalCount, 
    (CAST((SELECT COUNT(ID) FROM Users WHERE Users.Reputation > 10000) AS DECIMAL(10,2)) /
        (SELECT COUNT(ID) FROM Users )) * 100 AS Persantage
FROM Users
Thea
  • 7,879
  • 6
  • 28
  • 40
2

Using variables in MySQL:

SELECT @a:=(SELECT COUNT(*) FROM Users WHERE Users.Reputation >= 10000),
       @b:=(SELECT COUNT(*) FROM Users),
       IF(@b > 0, @a/@b, "--invalid--")
FROM Users
LIMIT 0,1
Anax
  • 9,122
  • 5
  • 34
  • 68