1

I am using below query in SQL Server to find distinct number of logins made in last 7 days(excluding today's date):

SELECT TOP (7) CONVERT(date, LoginTime) AS ActivityDate, COUNT(DISTINCT LoginID) AS UserCount
FROM Login
WHERE CONVERT(date, LoginTime) < CONVERT(date, GETDATE())
GROUP BY CONVERT(date, LoginTime)  
ORDER BY ActivityDate DESC; 

It generates following output:

ActivityDate | UserCount
----------------------
2019-02-21   | 2
2019-02-20   | 3
2019-02-19   | 2
2019-02-15   | 2
2019-02-14   | 1
2019-02-13   | 2
2019-02-12   | 3

My expectation is to have all last 7 days in a sequence (not like as current output where date 2019-02-16, 2019-02-17 and 2019-02-18 are missing after 2019-02-19). I need that, if a date is missing it must be displayed with 0 count.

My expected output is as below:

ActivityDate | UserCount
----------------------
2019-02-21   | 2
2019-02-20   | 3
2019-02-19   | 2
2019-02-18   | 0
2019-02-17   | 0
2019-02-16   | 0
2019-02-15   | 2
Salman A
  • 262,204
  • 82
  • 430
  • 521
Muhammad Waheed
  • 1,048
  • 1
  • 13
  • 30

4 Answers4

4

To see a particular value, the value must come from a row. So to see dates that don't exist on your login table, you must generated them as rows somewhere.

You can use a simple recursive CTE to generate 1 row per day between a particular interval, then use a LEFT JOIN to join logins that match on that particular day. The ones that don't match will still be displayed, since we are using LEFT JOIN.

DECLARE @GeneratingDateFrom DATE = DATEADD(DAY, -7, GETDATE())
DECLARE @GeneratingDateTo DATE = GETDATE()

;WITH GeneratedDates AS
(
    SELECT
        GeneratedDate = @GeneratingDateFrom

    UNION ALL

    SELECT
        GeneratedDate = DATEADD(DAY, 1, G.GeneratedDate)
    FROM
        GeneratedDates AS G
    WHERE
        DATEADD(DAY, 1, G.GeneratedDate) < @GeneratingDateTo
)
SELECT
    G.GeneratedDate,
    count(distinct L.LoginID) as UserCount
FROM 
    GeneratedDates AS G
    LEFT JOIN [Login] AS L ON G.GeneratedDate = CONVERT(date, L.LoginTime)
GROUP BY
    G.GeneratedDate
ORDER BY 
    G.GeneratedDate desc
EzLo
  • 13,780
  • 10
  • 33
  • 38
  • 4
    A recursive CTE is one of the [worst ways to generate a sequential list](https://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-2). For only 7 days this is not likely to be an issue, but for any future readers looking to scale this solution up to years/decades this will not scale well at all. – GarethD Feb 22 '19 at 11:49
  • 3
    @GarethD true, a calendar table or a numbers tally with `DATEADD` would be way better in this case. – EzLo Feb 22 '19 at 11:50
  • depends what you mean by 'worst'. The CTE method is mitigated by being totally self-contained and maintenance free, it relies on no other table or view to provide the dates. You'd want to consider overall performance, not just the speed of generating the dates – Cato Feb 22 '19 at 11:57
  • 2
    @Cato Most of the solutions in the article I posted a link to are also totally self contained, and rely on no other tables. The fastest approach is the adaptation of Itzik Ben-Gan's "Stacked CTE", which offers all the advantages that you have stated, without the overhead of an RBAR loop. Recursive CTEs are not much more than syntactic sugar for a while loop, if you wouldn't use a `WHILE` loop to perform a task, you shouldn't use a recursive CTE. – GarethD Feb 22 '19 at 12:01
  • @GarethD - using system tables which may in future end up with less rows in them is questionable for critical work, I'd class those as not really self-contained. The stacked CTE makes sense for generation of large numbers of rows, I might want to balance its greater code size against how many rows I wanted to generate. I should be able to ask a computer to count to 365 if I query a year, I'm not likely to need 50000 which is ~140 years – Cato Feb 22 '19 at 12:19
  • You are correct it is personally preference. I personally don't find copy and pasting a stacked CTE to be much of a problem, and would also never chose shorter code over more efficient code. But if you were so inclined, and knew the solution would never need to scale then there would be little or no harm in using a recursive CTE, but there would also be little or no benefit either. Since this is as much a resource site as it is a Q&A to specific problems, I think it is better to either use the most efficient approach or be explicit about the limitations of the suggested approach. – GarethD Feb 22 '19 at 12:35
  • To clarify, this is not a criticism of this answer, I am just being explicit about the limitations. One of the [answers on the page](https://stackoverflow.com/a/54825973/1048425) is case in point for this warning. It contains a direct copy and paste from [an answer I wrote 5 years ago](https://stackoverflow.com/a/23291758/1048425), but does not reference it. Were someone to do the same with this answer, to answer a question about generating a list of 50,000 numbers, it could propagate bad advice. – GarethD Feb 22 '19 at 12:35
3

You can try this. Here you need to get first minimum date and maximum date. After that you need to generate all dates between that two days. And last you need to join both table.

declare @MinDate date
declare @MaxDate date

select * into #temp from(
select top (7) CONVERT(date,LoginTime) as ActivityDate,count(distinct LoginID) as UserCount
        from Login
        where CONVERT(date,LoginTime )< convert(date,getdate())
        group by CONVERT(date,LoginTime )  
        order by ActivityDate desc; 
)a        

Set @MinDate = (select min (ActivityDate) from #temp)
Set @MaxDate = (select max (ActivityDate) from #temp)  

Select a.Date, isnull(b.UserCount,0) as UserCount from(
SELECT  TOP (DATEDIFF(DAY, @MinDate, @MaxDate) + 1)
        Date = DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY a.object_id) - 1, @MinDate)
FROM    sys.all_objects a
        CROSS JOIN sys.all_objects b;
)a left join #temp b on a.Date = b.ActivityDate

You can find the live demo Here. I have inserted your query output into a temp table but the logic is same.

Suraj Kumar
  • 5,547
  • 8
  • 20
  • 42
3

It is just 7 days so just type those dates:

SELECT ActivityDate, COUNT(DISTINCT LoginID) AS UserCount
FROM (VALUES
    (CAST(CURRENT_TIMESTAMP - 1 AS DATE)), -- build the list of dates
    (CAST(CURRENT_TIMESTAMP - 2 AS DATE)),
    (CAST(CURRENT_TIMESTAMP - 3 AS DATE)),
    (CAST(CURRENT_TIMESTAMP - 4 AS DATE)),
    (CAST(CURRENT_TIMESTAMP - 5 AS DATE)),
    (CAST(CURRENT_TIMESTAMP - 6 AS DATE)),
    (CAST(CURRENT_TIMESTAMP - 7 AS DATE))
) datelist(ActivityDate)
LEFT JOIN Login ON CAST(LoginTime AS DATE) = ActivityDate
GROUP BY ActivityDate
ORDER BY ActivityDate DESC
Salman A
  • 262,204
  • 82
  • 430
  • 521
1

The best way to generate dates that do not have rows in your table in to join on a calendar table.

Here's an extremely simple calendar table of one year, based on this answer:

CREATE TABLE [Calendar]
(
    [CalendarDate] DATETIME
)

DECLARE @StartDate DATETIME
DECLARE @EndDate DATETIME
SET @StartDate = GETDATE()
SET @EndDate = DATEADD(d, 365, @StartDate)

WHILE @StartDate <= @EndDate
      BEGIN
             INSERT INTO [Calendar]
             (
                   CalendarDate
             )
             SELECT
                   @StartDate

             SET @StartDate = DATEADD(dd, 1, @StartDate)
      END

(You can modify this query to add many more dates in the future, so that it won't need to be maintained for a while.)

Now you can join on the calendar table in your query like this:

select top (7) c.CalendarDate as ActivityDate,count(distinct LoginID) as UserCount
from Calendar c
left join Login l
    ON c.CalendarDate = CONVERT(date, l.LoginTime)
    and CONVERT(date,LoginTime )< convert(date,getdate())
group by c.CalendarDate 
order by c.CalendarDate desc; 

It's worth the space it takes up, it will come in handy in many other cases as well.

elizabk
  • 480
  • 2
  • 11