in stack-overflow, there's a badge called "Enthusiast" which is earned by "Visited the site each day for 30 consecutive days"
how to write this query in sql server?
in stack-overflow, there's a badge called "Enthusiast" which is earned by "Visited the site each day for 30 consecutive days"
how to write this query in sql server?
Create a table with
Id, LastVisit, DaysConsecutivelyVisited
and just update the table appropriately on each visit. The logic is clear, and there is no need for an ugly SQL query to otherwise extract the desired information.
You may gain some insight on the challenges here by reading the questions tagged enthusiast-badge on meta.stackoverflow.com.
One important issue is what constitutes a "visit" to the site. Many of us stay logged in for days at a time. So they aren't counting logins. They have a special definition of what constitutes a visit. You will have to define what a visit is.
A source of confusion is the definition of the date. I believe that SO is using GMT, which means that the date changes during the middle of the local date for many users. If you have users in other time zones, you will have to designate the time zone you are using for your date.
If you are fortunate enough to be able to base your counts on logins, and your users are in the same time zone, your recordkeeping is relatively easy. You just record each login, and then use SQL as suggested by Jeremiah Peschka to look for gaps.
If you must track visits, and not logins, you will have to define what a visit is, and create a separate database table to log visits. Obviously, there will be a lot of logging to that table, and you might want to consider the performance hit of all of those database calls vs. the value of having this information.
As long as you have a column where you keep track of the raw days where a user has visited the site, you can write a query to look for gaps in the data. A specific query would be pretty lengthy to put in here (no doubt someone smarter than me will come up with a short example that proves me wrong), but it's very possible, just a little bit time consuming to write the code to detect.
You could simply have a table that tracks logins, and number of days since last login (literally nothing more than: userid, logindate, dayssincelastlogin). This then means all you need to do is query that table, making sure that the number of days since last login is less than or equal to 1 for the last 30 days.
Assuming you have a table visits(uid, timestamp), I would take the generous route, and say that the person could not have visited every day if any closest pair of visits was greater than 48 hours apart. Other, I will assume that they did (they could, in fact, go one night and two mornings later or something, but whatever, that can be adjusted). You could use other definitions of day pretty easily, since this just returns the max distance between two points.
Jeremiah's link is probably more efficient for large datasets, but the simpler query is:
SELECT uid, MAX(dist) FROM
(SELECT v1.uid AS uid, MIN(v1.timestamp-v2.timestamp) AS dist
FROM visits v1 LEFT JOIN visits v2
ON v1.uid = v2.uid
WHERE $dateRangeLimiter
GROUP BY v1.uid, v1.timestamp) WHERE uid = $targetUid
This will get the greatest distance between two time-points within the given WHERE clause and UID specification. If you're just looking for one uid, that WHERE should be pushed into the inner SELECT; as it stands the query will do this for all users.
It's not the most efficient way, but if that's what you have, and it's not that big of a data set, it should work nicely.
Let's asssume you already have a table or view logindays
with (user, day)
containing one row per user and day visited (do this by rounding login times and grouping). DOK already explains the issues with the definition of 'day'.
Here goes:
select user, min(day) enthusiast_since from
( select user, day, lag(day, 29) over (partition by user order by day) daydiff from logindays )
where day - daydiff = 29
group by user;
(29 because day 1 is 29 days before day 30)
The most accurate way is to log the user's activity. This way there is no confusion about "is he logged in" for those who remain logged in by inactive for days (which shouldn't count).
For maintenance, delete all log records > 30 days.
Query the table for the greatest time distance between any two log entries for any given user in the last 30 days. If the max result is <= 1, he wins.
I'm sure you can work out how to write the query.