0

I have a SQL Server table called AccessLog that is inserted into every time a user clicks "Log In" on my website. The important fields are UserID, BrowserName and Date. The intent is to get an idea of which browsers are used the most to access the site.

Here is a sample data set:

| UserID | BrowserName | Date  |
|--------|-------------|-------|
| 1      | Chrome      | 05/28 |
| 1      | Chrome      | 05/29 |
| 2      | Firefox     | 05/29 |
| 1      | Chrome      | 05/30 |
| 3      | Firefox     | 05/31 |

This is the SQL code I use (found here):

SELECT BrowserName
    ,(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER ()) pct
    ,COUNT(*) ct
FROM AccessLog
GROUP BY BrowserName

These are the results I get with that query:

| BrowserName | pct  | ct |
|-------------|------|----|
| Chrome      | 60.0 | 3  |
| Firefox     | 40.0 | 2  |

But these are the results I want:

| BrowserName | pct  | ct |
|-------------|------|----|
| Chrome      | 33.3 | 1  |
| Firefox     | 66.6 | 2  |

How can I modify my query to count each distinct UserID and BrowserName pair once, to not artificially inflate the counts?

Dale K
  • 25,246
  • 15
  • 42
  • 71

2 Answers2

1

You can try the following by creating a cte. Here is the demo.

with total_users as
(
    select
      BrowserName,
      count(distinct UserId) as total_users
    from myTable
    group by
      BrowserName
)    
select 
    BrowserName,
    round((total_users * 100.0)/sum(total_users) over (), 1) as pct,
    total_users as ct
from total_users

Output:

| browsername | pct   | ct  |
| ----------- | ----- | --- |
| Chrome      | 33.3  | 1   |
| Firefox     | 66.6  | 2   |
zealous
  • 7,336
  • 4
  • 16
  • 36
  • This is perfect - I cast the pct column to a decimal to remove the extra 0's, but otherwise perfect. Thanks very much. – aswans2012 Jun 01 '20 at 22:24
0

You want to use count(distinct). But that is not possible as a window function -- directly -- in SQL Server. So use the dense_rank() trick:

SELECT BrowserName,
       (COUNT(DISTINCT UserId) * 100.0 /
        (DENSE_RANK() OVER (ORDER BY UserId ASC) +
         DENSE_RANK() OVER (ORDER BY UserId DESC)
        )
       ) as pct,
      COUNT(DISTINCT UserId) as ct
FROM AccessLog
GROUP BY BrowserName
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks, but I got syntax errors from this in the lines beginning with SUM and COUNT. My best guess edit to remove those returned `Column 'AccessLog.UserId` is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.` I added `UserId` to the `GROUP BY` clause and got `Cannot perform an aggregate function on an expression containing an aggregate or a subquery.` – aswans2012 Jun 01 '20 at 22:23
  • @aswans2012 . . . That line was detritus. I thought I had deleted it. – Gordon Linoff Jun 01 '20 at 22:30
  • Still getting `Column 'AccessLog.UserId' is invalid in the select list...` etc. – aswans2012 Jun 01 '20 at 22:41
  • @aswans2012 . . . That column is in your sample data. – Gordon Linoff Jun 02 '20 at 01:08