5

I have a complex problem that seems to be trivial at first sight:

  • for a given 90 day window, how many distinct active users did I have?

The table I will use to query this is the login table (hosted in Redshift), and it has a timestamp with the logintime and usertoken as the user identifier.

Whenever I want to answer this for a single day, the query is easy and straightforward:

select count (distinct usertoken)
from logins
where datediff('d',logintime,getdate()) <= 90

The problem becomes complex because I want to have this in a table with the number for every given date.

07/07  100k
07/06  98k
07/05  99k
07/04  101k

(...)

Window functions do not help me because I need to count distinct, and this is not possible in a window function.

To my knowledge, there is no way to iterate in a SQL query.

How should I go about this?

  • Please elaborate on `every given date`. Do you pass lower and upper bound for the time range? Or an array of random dates? Can there be duplicate entries for a single user on the same day? How big is your table? Number of rows? Bytes per row / table definition. – Erwin Brandstetter Jul 08 '16 at 13:47

3 Answers3

0

The trivial way is very computationally expensive:

select days.d, count(distinct l.userid)
from (select distinct date_trunc('day', logintime) as d
      from logins l
     ) days left join
     (select distinct userid, date_trunc('day', logintime) as d
      from logins
     ) l
     on datediff('d', l.d, days.d) between 0 and 89
group by days.d
order by days.d;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Perhaps I am missing something but from what I understand this should do :

-- In SQL Server

select cast(logintime As Date) , count (distinct usertoken) from logins where datediff(D,logintime,getdate()) <= 90 Group by cast(logintime As Date)

in PostGreSQL Change cast(logintime As Date) to trunc_Date(Day, logintime ) and datediff(D,logintime,getdate()) to datediff('d',logintime,getdate())

objectNotFound
  • 1,683
  • 2
  • 18
  • 25
0

I am assuming that if a day has zero users logging in you don't mind not showing it in the list.

First we get a set of all the days we care about and call that set "days".

  with days as (
     select date_trunc('day', date) as day from logins
     where date > now() - '90 days'::interval
     group by day 
    )

Then we join the days set with the logins.

select day, count(distinct userid)
from days
join  logins on date_trunc('day', logins.date) = days.day
group by day
order by day
user1852503
  • 4,747
  • 2
  • 20
  • 28