Find the longest streak of perfect scores per player

Question

I have a the following result from a SELECT query with ORDER BY player_id ASC, time ASC in PostgreSQL database:

player_id  points  time

395        0       2018-06-01 17:55:23.982413-04
395        100     2018-06-30 11:05:21.8679-04
395        0       2018-07-15 21:56:25.420837-04
395        100     2018-07-28 19:47:13.84652-04
395        0       2018-11-27 17:09:59.384-05
395        100     2018-12-02 08:56:06.83033-05
399        0       2018-05-15 15:28:22.782945-04
399        100     2018-06-10 12:11:18.041521-04
454        0       2018-07-10 18:53:24.236363-04
675        0       2018-08-07 20:59:15.510936-04
696        0       2018-08-07 19:09:07.126876-04
756        100     2018-08-15 08:21:11.300871-04
756        100     2018-08-15 16:43:08.698862-04
756        0       2018-08-15 17:22:49.755721-04
756        100     2018-10-07 15:30:49.27374-04
756        0       2018-10-07 15:35:00.975252-04
756        0       2018-11-27 19:04:06.456982-05
756        100     2018-12-02 19:24:20.880022-05
756        100     2018-12-04 19:57:48.961111-05

I'm trying to find each player's longest streak where points = 100, with the tiebreaker being whichever streak began most recently. I also need to determine the time at which that player's longest streak began. The expected result would be:

player_id  longest_streak  time_began

395        1               2018-12-02 08:56:06.83033-05
399        1               2018-06-10 12:11:18.041521-04
756        2               2018-12-02 19:24:20.880022-05

You should find the solution here, with window functions : https://www.postgresql.org/docs/9.1/tutorial-window.html — Arnaud Peralta, Jun 13 '19 at 15:01
Is a streak interrupted by rows from other players? Also: your version of Postgres? — Erwin Brandstetter, Jun 13 '19 at 15:30

Erwin Brandstetter · Accepted Answer · 2019-06-13T21:53:38.537

A gaps-and-islands problem indeed.

Assuming:

"Streaks" are not interrupted by rows from other players.
All columns are defined NOT NULL. (Else you have to do more.)

This should be simplest and fastest as it only needs two fast row_number() window functions:

SELECT DISTINCT ON (player_id)
       player_id, count(*) AS seq_len, min(ts) AS time_began
FROM  (
   SELECT player_id, points, ts
        , row_number() OVER (PARTITION BY player_id ORDER BY ts) 
        - row_number() OVER (PARTITION BY player_id, points ORDER BY ts) AS grp
   FROM   tbl
   ) sub
WHERE  points = 100
GROUP  BY player_id, grp  -- omit "points" after WHERE points = 100
ORDER  BY player_id, seq_len DESC, time_began DESC;

db<>fiddle here

Using the column name ts instead of time, which is a reserved word in standard SQL. It's allowed in Postgres, but with limitations and it's still a bad idea to use it as identifier.

The "trick" is to subtract row numbers so that consecutive rows fall in the same group (grp) per (player_id, points). Then filter the ones with 100 points, aggregate per group and return only the longest, most recent result per player.
Basic explanation for the technique:

Select longest continuous sequence

We can use GROUP BY and DISTINCT ON in the same SELECT, GROUP BY is applied before DISTINCT ON. Consider the sequence of events in a SELECT query:

Best way to get result count before LIMIT was applied

About DISTINCT ON:

Select first row in each GROUP BY group?

D-Shih · Answer 2 · 2019-06-13T15:14:37.913

This is a gap and island problem, you can try to use SUM condition aggravated function with window function, getting gap number.

then use MAX and COUNT window function again.

Query 1:

WITH CTE AS (
    SELECT *,
           SUM(CASE WHEN points = 100 THEN 1 END) OVER(PARTITION BY player_id ORDER BY time) - 
           SUM(1) OVER(ORDER BY time) RN
    FROM T
)
SELECT player_id,
       MAX(longest_streak) longest_streak,
       MAX(cnt) longest_streak 
FROM (
  SELECT player_id,
         MAX(time) OVER(PARTITION BY rn,player_id) longest_streak, 
         COUNT(*) OVER(PARTITION BY rn,player_id)  cnt
  FROM CTE 
  WHERE points > 0
) t1
GROUP BY player_id

Results:

| player_id |              longest_streak | longest_streak |
|-----------|-----------------------------|----------------|
|       756 | 2018-12-04T19:57:48.961111Z |              2 |
|       399 | 2018-06-10T12:11:18.041521Z |              1 |
|       395 |  2018-12-02T08:56:06.83033Z |              1 |

Gordon Linoff · Answer 3 · 2019-06-13T16:34:48.927

One way to do this is to look at how many rows between the previous and next non-100 results. To get the lengths of the streaks:

with s as (
      select s.*,
             row_number() over (partition by player_id order by time) as seqnum,
             count(*) over (partition by player_id) as cnt          
      from scores s
     )
select s.*,
       coalesce(next_seqnum, cnt + 1) - coalesce(prev_seqnum, 0) - 1 as length
from (select s.*,
             max(seqnum) filter (where score <> 100) over (partition by player_id order by time) as prev_seqnum,
             max(seqnum) filter (where score <> 100) over (partition by player_id order by time) as next_seqnum
      from s
     ) s
where score = 100;

You can then incorporate the other conditions:

with s as (
      select s.*,
             row_number() over (partition by player_id order by time) as seqnum,
             count(*) over (partition by player_id) as cnt          
      from scores s
     ),
     streaks as (
      select s.*,
             coalesce(next_seqnum - prev_seqnum) over (partition by player_id) as length,
             max(next_seqnum - prev_seqnum) over (partition by player_id) as max_length,
             max(next_seqnum) over (partition by player_id) as max_next_seqnum
      from (select s.*,
                   coalesce(max(seqnum) filter (where score <> 100) over (partition by player_id order by time), 0) as prev_seqnum,
                   coalesce(max(seqnum) filter (where score <> 100) over (partition by player_id order by time), cnt + 1) as next_seqnum
            from s
           ) s
      where score = 100
     )
select s.*
from streaks s
where length = max_length and
      next_seqnum = max_next_seqnum;

score 0 · Answer 4 · answered Nov 25 '21 at 03:36

Here is my answer

select 
user_id,
non_streak,
streak,
ifnull(non_streak,streak) strk,
max(time) time
from (

Select
user_id,time,
points,
lag(points) over (partition by user_id order by time) prev_point,
case when points + lag(points) over (partition by user_id order by time) = 100  then 1 end as non_streak,
case when points + lag(points) over (partition by user_id order by time) > 100  then 1 end as streak


From players
) where ifnull(non_streak,streak) is not null
group by 1,2,3
order by 1,2 
) group by user_id`

Please consider adding a brief explanation of [how and why this solves the problem](https://meta.stackoverflow.com/q/392712/13138364). This will help readers to better understand your solution. — tdy, Nov 25 '21 at 04:26

Find the longest streak of perfect scores per player

4 Answers4

Linked