2

I want to calculate the longest "streak" of every user within 60 days from this mysql table. Streak means there is an entry for the user on this day.

+-----+------------+---------------------+
| id  | user       | date                |
+-----+------------+---------------------+
|   3 | test1      | 2014-06-10 23:55:01 |
|   4 | test2      | 2014-06-10 02:01:06 |
|   5 | test1      | 2014-06-11 23:55:06 |
|   6 | test2      | 2014-06-11 23:55:07 |
|   7 | test1      | 2014-06-12 23:55:07 |
|   9 | test1      | 2014-06-13 23:55:07 |
|   10| test2      | 2014-06-13 23:55:07 |

The output should look like this:

test1  4
test2  2 no entry on  2014-06-12

But I don´t know how to do this correctly.

juergen d
  • 201,996
  • 37
  • 293
  • 362
PatrickPirker
  • 359
  • 3
  • 15

1 Answers1

4

One way to do this is to use MySQL user variables. This isn't necessarily the most efficient approach for large sets, since it materializes two inline views.

SELECT s.user
     , MAX(s.streak) AS longest_streak
  FROM ( SELECT IF(@prev_user = o.user AND o.date = @prev_date + INTERVAL 1 DAY
                  , @streak := @streak + 1
                  , @streak := 1
                ) AS streak
              , @prev_user := o.user AS user
              , @prev_date := o.date AS `date`
           FROM ( SELECT t.user
                       , DATE(t.date) AS `date`
                    FROM mytable t
                   CROSS
                    JOIN (SELECT @prev_user := NULL, @prev_date := NULL, @streak := 1) i
                   WHERE t.date >= DATE(NOW()) + INTERVAL -60 DAY
                   GROUP BY t.user, DATE(t.date)
                   ORDER BY t.user, DATE(t.date)
                ) o
       ) s
 GROUP BY s.user

The inline view aliased as i just initializes some user variables; we don't really care what it returns, except that we need it to return exactly 1 row because of the JOIN operation; we just really care about the side effect of initializing user variables early in the statement execution.

The inline view aliased as o gets a list of users and dates; the specification was for an entry "on each date", so we can truncate off the time portion, and get just the DATE, and make that into a distinct set, using the GROUP BY clause.

The inline view aliased as s processes each row, and saves the values of the current row into the @prev_ user variables. Before it overwrites the values, it compares the values on the current row to the values (saved) from the previous row. If the user matches, and the date on the current row is exactly 1 day later than the previous date, we are continuing a "streak", so we increment the current value of the @streak variable by 1. Otherwise, the previous streak was broken, and we start a new "streak", resetting @streak to 1.

Finally, we process the rows from s to extract the maximum streak for each user.

(This statement is desk checked only, there could be a typo or two.)

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • I just modified the query, I missed subtracting 60 days on the predicate that was supposed to get the past 60 days; I also removed unnecessary DATE() functions in the inline view s, since we already handle that earlier. I also added aliases to columns returned by inline view s. – spencer7593 Jun 12 '14 at 23:02
  • I was doing the same thing but little slow here is mine [**demo**](http://www.sqlfiddle.com/#!2/f0e95/2) – M Khalid Junaid Jun 12 '14 at 23:06
  • @M Khalid Junaid: yes, it's essentially the same approach. Just a couple of comments: the `TIMESTAMPDIFF(DAY,` function will return 0 if the second and third arguments are less than 24 hours apart, or can return a value of 1 if the arguments are over 24 hours but less than 48 hours apart. (I assumed OP wanted to consider just the "date" portion in the comparison, and handled the condition when a user had multiple entries for same date.) Test cases with a series of entries 23 hours apart, or 25 hours apart would be sufficient to demonstrate the difference. – spencer7593 Jun 12 '14 at 23:23
  • SQL Fiddle Demo here: [http://sqlfiddle.com/#!2/4d0c7/1](http://sqlfiddle.com/#!2/4d0c7/1) – spencer7593 Jun 12 '14 at 23:40
  • This is getting me on the right track. I'll re-read and study it tomorrow. +1. Thank you! – Ryan Sep 04 '18 at 02:27