0

I am currently trying to get a distinct count for customers over a 90 day rolling period. I have got the amount using sum amount and over partition. However, when I do this with count distinct, SQL doesn't have functionality.

I have attempted to use row_number() with the over partition and use rows current row and 90 preceding but this also isn't available.

Would greatly appreciate any suggested work around to resolve this problem.

I have attempted to solve the problem using 2 approaches, both which have failed based on the limitations outlined above.

Approach 1

select date 
       ,count(distinct(customer_id)) over partition () order by date rows current row and 89 preceding as cust_count_distinct

from table

Approach 2

select date 
       ,customer_id
       ,row_number() over partition (customer_id) order by date rows current row and 89 preceding as rn
from table

-- was then going to filter for rn = '1' but the rows functionality not possible with ranking function windows.

egnkc
  • 11
  • 1
  • 2
    Possible duplicate of [Partition Function COUNT() OVER possible using DISTINCT](https://stackoverflow.com/questions/11202878/partition-function-count-over-possible-using-distinct) – GSerg Feb 12 '19 at 13:31
  • What have you tried till now? – Maciej Los Feb 12 '19 at 13:31
  • 2
    Hello egnkc, and welcome to Stack Overflow :). Can you [edit] your question and add your SQL statement please? – TT. Feb 12 '19 at 13:31
  • To help you, we need you to [edit](https://stackoverflow.com/posts/54651203/edit) your question and add theses informations : Your current query - The table(s) structure(s) (the `CREATE TABLE` scripts are really welcome) - Some sample datas - your expected output - the current output – Cid Feb 12 '19 at 13:31
  • 1
    Sample data and desired results would help. – Gordon Linoff Feb 12 '19 at 13:38
  • @GSerg - I don't see that as a duplicate, the issue with sliding windows and `DISTINCT` is different. Because the problem is that as days enter and leave the window it requires some evaluation to know whether or not a customer is entering or leaving the window as a whole – Martin Smith Feb 12 '19 at 13:45
  • @MartinSmith I thought the issue is that `distinct` is [not supported](https://learn.microsoft.com/en-us/sql/t-sql/functions/count-transact-sql?view=sql-server-2017#syntax) in principle when `over` is used. Won't the [accepted answer](https://stackoverflow.com/a/22347502/11683) work regardless of an explicitly defined `rows between`? – GSerg Feb 12 '19 at 13:47
  • @GSerg - No I can't see how that would work. The problem is more difficult than keeping track of an incrementing counter. If Bob shops on day 1 and day 45 they should be included in the `distinct` count until day 135 then drop out – Martin Smith Feb 12 '19 at 13:49
  • @MartinSmith I believe the `dense_rank` approach would have handled it because the row that keeps Bob alive is always visible in the window. The problem is that `dense_rank` does not support `rows between` at all (which I overlooked). – GSerg Feb 12 '19 at 14:20

1 Answers1

2

The simplest method is a correlated subquery of some sort:

select d.date, c.nt
from (select distinct date from t) d cross apply
     (select count(distinct customerid) as cnt
      from t t2
      where t2.date >= dateadd(day, -89, d.date) and
            t2.date <= d.date
     ) c;

This is not particularly efficient (i.e. a killer) on even a medium data set. But it might serve your needs.

You can restrict the dates being returned to test to see if it works.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786