The special difficulty is that we need some of the rows with ('foo', TRUE)
in two groups - at the end of one and at the start of the next. So we have to add another instance of those.
To make it simpler to understand, and also to work with a stable set of rows, I put your sample values into a temporary table:
CREATE TEMP TABLE t AS
SELECT id, value
,CASE WHEN value = 'foo' AND random() < 0.5
THEN TRUE ELSE null
END AS to_group
FROM (
SELECT id, CASE WHEN random() < 0.2 THEN 'foo' ELSE 'bar' END AS value
FROM generate_series(1,100) id
) sub;
Using the boolean
data type for the flag to make it a bit simpler.
Then my query is reduced to:
WITH cte AS (
SELECT *, count(value = 'foo' OR NULL) OVER (ORDER BY id) AS grp
FROM t
)
SELECT grp, min(id) AS min_id, max(id) AS max_id
FROM (
SELECT id, value, to_group, grp FROM cte
UNION ALL
SELECT id, value, to_group, grp - 1 FROM cte WHERE to_group
) sub
GROUP BY grp
HAVING count(value = 'foo' OR NULL) = 2
ORDER BY grp;
Explain
In the CTE cte
I add a running count of rows grp
with value = 'foo'
. Other rows in between get the same number:
- The expression
value = 'foo' OR NULL
is NULL for all values that are not 'foo'
.
count()
only counts non-null values.
- Members of your groups now have the same
grp
number, plus the next row be to_group
.
- Detailed explanation for this technique:
Compute percents from SUM() in the same SELECT sql query
As mentioned, the special difficulty is that we need some rows twice. So we add another instance with a UNION ALL
in the subquery sub
. While being at it, I decrease the grp
number of the copies by 1 so your groups are now complete.
The final SELECT can now just GROUP BY grp
.
Valid groups have two rows with value = 'foo'
.