-1

My problem boils down to the following. I have a table with some natural sequencing, and in it I have a key value which may repeat over time. I want to find the blocks where the key is the same, then changes, and then comes back to being the same. Example:

  1. A
  2. A
  3. B
  4. B
  5. B
  6. C
  7. C
  8. A
  9. A
  10. C
  11. C

Here I want the result to be

  1. A, 1-2
  2. B, 3-5
  3. C, 6-7
  4. A, 8-9
  5. C, 10-11

so I can't use that key value A, B, C to group by, because the same key can appear multiple times, I just want to squeeze out repetitive occurrences that are uninterrupted.

Needless to say, I want the simplest SQL one can come up with. It would use OLAP window functions.

I am usually pretty good with complicated SQL, but with sequences I am not so good. I will work on this a little bit myself, of course, and annex some ideas below this question in a subsequent edit.

Let's begin by defining the table for our discussion:

CREATE TABLE Seq (
  num integer,
  key char
);

UPDATE 1: doing some research I find a similar question here: How to find consecutive rows based on the value of a column? but both the question and the answers are wrapped up into a lot of extra stuff and confusing.

UPDATE 2: I already got one answer, thanks. Inspecting it now. Here is my test I am typing into PostgreSQL even as we speak:

CREATE TABLE Seq ( num int, key char );
INSERT INTO Seq VALUES 
   (1, 'A'), (2, 'A'), 
   (2, 'B'), (3, 'B'), (5, 'B'), 
   (6, 'C'), (7, 'C'), 
   (8, 'A'), (9, 'A'), 
   (10, 'C'), (11, 'C');

UPDATE 3: First contender of a solution is this

SELECT key, min(num), max(num)
  FROM (
    SELECT seq.*,
           row_number() over (partition by key order by num) as seqnum
      FROM Seq 
  ) s
  GROUP BY key, (num - seqnum) 
  ORDER BY min;

yields:

 key | min | max
-----+-----+-----
 A   |   1 |   2
 B   |   2 |   3
 B   |   5 |   5
 C   |   6 |   7
 A   |   8 |   9
 C   |  10 |  11
(6 rows)

for some reason B repeats twice, I see why, I made a "mistake" in my test data, skipping sequence num 4 and going straight from 3 to 5.

This mistake is fortunate, because it allows me to point out that while in this example the sequence number is discrete, I am intending the sequence to arise from some continuous domain (e.g., time).

There is another "mistake" I made, in that I have num 2 repeated. Is that allowable? Probably not. So cleaning up the example, removing duplicate but leaving the gap:

DROP TABLE Seq;
CREATE TABLE Seq ( num int, key char );
INSERT INTO Seq VALUES 
   (1, 'A'), (2, 'A'), 
   (3, 'B'), (4, 'B'), (6, 'B'), 
   (7, 'C'), (8, 'C'), 
   (9, 'A'), (10, 'A'), 
   (11, 'C'), (12, 'C');

this still leaves us with the duplicate B block:

 key | min | max
-----+-----+-----
 A   |   1 |   2
 B   |   3 |   4
 B   |   6 |   6
 C   |   7 |   8
 A   |   9 |  10
 C   |  11 |  12
(6 rows)

Now going with that first intuition by Gordon Linoff and trying to understand it and add to it:

SELECT s.*, num - seqnum AS diff 
  FROM (
    SELECT seq.*,
           row_number() over (partition by key order by num) as seqnum
      FROM Seq
    ) s
  ORDER BY num;

here is the num - seqnum trick before grouping:

 num | key | seqnum | diff
-----+-----+--------+------
   1 | A   |      1 |    0
   2 | A   |      2 |    0
   3 | B   |      1 |    2
   4 | B   |      2 |    2
   6 | B   |      3 |    3
   7 | C   |      1 |    6
   8 | C   |      2 |    6
   9 | A   |      3 |    6
  10 | A   |      4 |    6
  11 | C   |      3 |    8
  12 | C   |      4 |    8
(11 rows)

I doubt that this is the answer quite yet.

Gunther Schadow
  • 1,490
  • 13
  • 22

2 Answers2

1

This answers the original problem.

You can enumerate the rows for each key and subtract that from num. Voila! This is number is constant when the key is constant on adjacent rows:

select key, min(num), max(num)
from (select seq.*,
             row_number() over (partition by key order by num) as seqnum
      from seq
     ) s
group by key, (num - seqnum);

Here is a db<>fiddle showing that it works.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • not quite, I am showing the results of your query above. – Gunther Schadow Mar 14 '21 at 16:05
  • @GuntherSchadow . . . You changed the data -- and the definition of the problem. The original problem implied (very strongly) that the nums had to differ by 1 to be in the same group and were gapless and sequential. You entirely changed the problem; asking a *new* question is better than invalidating answers. – Gordon Linoff Mar 14 '21 at 17:15
1

Because of gaps you can't use num directly as Gordon's solution suggested. Row_number it too.

select key, min(num), max(num)
from (select seq.*,
             row_number() over (order by  num) as rn, 
             row_number() over (partition by key order by num) as seqnum
      from seq
     ) s
group by key, (rn - seqnum)
order by min(num);
Serg
  • 22,285
  • 5
  • 21
  • 48