1

Every time a logged in user visits the website their data is put into a table containing the userId and date (either one or zero row per user per day):

   444631 2011-11-07
   444631 2011-11-06
   444631 2011-11-05
   444631 2011-11-04
   444631 2011-11-02
   444631 2011-11-01

I need to have ready access to the number of consecutive visits when I pull the user data from the main user table.. In the case for this user, it would be 4.

Currently I'm doing this through a denormalized consecutivevisits counter in the main user table, however for unknown reasons it sometimes resets.. I want to try an approach that uses exclusively the data in the table above.

What's the best SQL query to get that number (4 in the example above)? There are users who have hundreds of visits, we have millions of registered users and hits per day.

EDIT: As per the comments below I'm posting the code I currently use to do this; it however has the problem that it sometimes resets for no reason and it also reset it for everyone during the weekend, most likely because of the DST change.

// Called every page load for logged in users
public static function OnVisit($user)
{
    $lastVisit = $user->GetLastVisit(); /* Timestamp; db server is on the same timezone as www server */
    if(!$lastVisit)
        $delta = 2;
    else
    {
        $today = date('Y/m/d');

        if(date('Y/m/d', $lastVisit) == $today)
            $delta = 0;
        else if(date('Y/m/d', $lastVisit + (24 * 60 * 60)) == $today)
            $delta = 1;
        else
            $delta = 2;
    }

    if(!$delta)
        return;

    $visits = $user->GetConsecutiveVisits();
    $userId = $user->GetId();

            /* NOTE: t_dailyvisit is the table I pasted above. The table is unused;
             * I added it only to ensure that the counter sometimes really resets
             * even if the user visits the website, and I could confirm that. */
    q_Query("INSERT IGNORE INTO `t_dailyvisit` (`user`, `date`) VALUES ($userId, CURDATE())", DB_DATABASE_COMMON);

    /* User skipped 1 or more days.. */
    if($delta > 1)
        $visits = 1;
    else if($delta == 1)
        $visits += 1;

    q_Query("UPDATE `t_user` SET `consecutivevisits` = $visits, `lastvisit` = CURDATE(), `nvotesday` = 0 WHERE `id` = $userId", DB_DATABASE_COMMON);
    $user->ForceCacheExpire();
}
Andreas Bonini
  • 44,018
  • 30
  • 122
  • 156
  • Is it possible to change it to store the last visit date, days so far, and maximum consecutive? – xthexder Nov 07 '11 at 19:57
  • @xthexder: that's how it is now.. But I wanted to change it to this system so it's "error proof"; the daily visit table has existed for a while now (but unused). Rarely, sometimes the consecutive days counter resets for a bug that I cannot replicate while the table clearly shows that the count should be higher. So if I use the table for the data, then the bug won't occur anymore. – Andreas Bonini Nov 07 '11 at 20:00
  • A more efficient solution would be to post your existing code and fix it, rather than using this, more resource intensiive, solution. – xthexder Nov 07 '11 at 20:04

2 Answers2

3

I missed the mysql tag and wrote up this solution. Sadly, this does not work in MySQL as it does not support window functions.

I post it anyway, as I put some effort into it. Tested with PostgreSQL. Would work similarly with Oracle or SQL Server (or any other decent RDBMS that supports window functions).

Test setup

CREATE TEMP TABLE v(id int, visit date);
INSERT INTO v VALUES
 (444631, '2011-11-07')
,(444631, '2011-11-06')
,(444631, '2011-11-05')
,(444631, '2011-11-04')
,(444631, '2011-11-02')
,(444631, '2011-11-01')
,(444632, '2011-12-02')
,(444632, '2011-12-03')
,(444632, '2011-12-05');

Simple version

-- add 1 to "difference" to get number of days of the longest period
SELECT id, max(dur) + 1 as max_consecutive_days
FROM (

   -- calculate date difference of min and max in the group
   SELECT id, grp, max(visit) - min(visit) as dur
   FROM (

      -- consecutive days end up in a group
      SELECT *, sum(step) OVER (ORDER BY id, rn) AS grp
      FROM   (

         -- step up at the start of a new group of days
         SELECT id
               ,row_number() OVER w AS rn
               ,visit
               ,CASE WHEN COALESCE(visit - lag(visit) OVER w, 1) = 1
                THEN 0 ELSE 1 END AS step
         FROM   v
         WINDOW w AS (PARTITION BY id ORDER BY visit)
         ORDER  BY 1,2
         ) x
      ) y
      GROUP BY 1,2
   ) z
GROUP  BY 1
ORDER  BY 1
LIMIT  1;

Output:

   id   | max_consecutive_days
--------+----------------------
 444631 |                    4

Faster / Shorter

I later found an even better way. grp numbers are not continuous (but continuously rising). Doesn't matter, since those are just a mean to an end:

SELECT id, max(dur) + 1 AS max_consecutive_days
FROM (
    SELECT id, grp, max(visit) - min(visit) AS dur
    FROM (
      -- subtract an integer representing the number of day from the row_number()
      -- creates a "group number" (grp) for consecutive days
      SELECT id
            ,EXTRACT(epoch from visit)::int / 86400
           - row_number() OVER (PARTITION BY id ORDER BY visit) AS grp
            ,visit
      FROM   v
      ORDER  BY 1,2
      ) x
    GROUP BY 1,2
    ) y
GROUP  BY 1
ORDER  BY 1
LIMIT  1;

SQL Fiddle for both.

More

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

If it is not necessary to have a log of every day the user was logged on to the webiste and you only want to know the consecutive days he was logged on, I would prefer this way:

Chose 3 columns: LastVisit (Date), ConsecutiveDays (int) and User.

On log-in you check the entry for the user, determine if last visit was "Today - 1", then add 1 to the columns ConsecutiveDays and store "Today" in column LastVisit. If last vist is greater than "Today - 1" then store 1 in ConsecutiveDays.

HTH

Fischermaen
  • 12,238
  • 2
  • 39
  • 56