Group two rows and all rows in between on condition

Question

A table defined through the following query gives an output like shown in the screenshot below:

select
id,
value,
case when value = 'foo'
and random() <= 0.5
then 't' else null
end as to_group

from

(select
generate_series(1,100) as id,
case when random() <= 0.2 then 'foo' else 'bar' end as value
)t1

enter image description here

How can I group all rows tagged with 'foo' and 't' with the preceding 'foo' row (no matter if it hast to_group = 't' or not) and all enclosed 'bar' rows?
In the given example those rows are 33 - 37.

BTW: this is a repost of a question yesterday. (which probably got deleted, because I cannot see it from here) — joop, Aug 19 '14 at 09:30
Hi joop: i deleted and tried to rewrite it in a clearer way ;) — Berlin_J, Aug 19 '14 at 09:34
Well: you failed to make it any clearer: it is unclear what you mean by *grouping*. . As @dude said, you should add your intended output to the question. (BTW: it looks like a gaps-and-islands type of problem and could probably be solved by a recursive query) — joop, Aug 19 '14 at 09:39

score 1 · Answer 1 · edited May 23 '17 at 12:29

The special difficulty is that we need some of the rows with ('foo', TRUE) in two groups - at the end of one and at the start of the next. So we have to add another instance of those.

To make it simpler to understand, and also to work with a stable set of rows, I put your sample values into a temporary table:

CREATE TEMP TABLE t AS
SELECT id, value
      ,CASE WHEN value = 'foo' AND random() < 0.5
         THEN TRUE ELSE null
       END AS to_group
FROM  (
   SELECT id, CASE WHEN random() < 0.2 THEN 'foo' ELSE 'bar' END AS value
   FROM   generate_series(1,100) id
   ) sub;

Using the boolean data type for the flag to make it a bit simpler.
Then my query is reduced to:

WITH cte AS (
   SELECT *, count(value = 'foo' OR NULL) OVER (ORDER BY id) AS grp
   FROM  t
   )
SELECT grp, min(id) AS min_id, max(id) AS max_id
FROM  (
   SELECT id, value, to_group, grp     FROM cte
   UNION  ALL
   SELECT id, value, to_group, grp - 1 FROM cte WHERE to_group
   ) sub
GROUP  BY grp
HAVING count(value = 'foo' OR NULL) = 2
ORDER  BY grp;

Explain

In the CTE cte I add a running count of rows grp with value = 'foo'. Other rows in between get the same number:
- The expression value = 'foo' OR NULL is NULL for all values that are not 'foo'.
- count() only counts non-null values.
- Members of your groups now have the same grp number, plus the next row be to_group.
- Detailed explanation for this technique:
  Compute percents from SUM() in the same SELECT sql query
As mentioned, the special difficulty is that we need some rows twice. So we add another instance with a UNION ALL in the subquery sub. While being at it, I decrease the grp number of the copies by 1 so your groups are now complete.
The final SELECT can now just GROUP BY grp.
Valid groups have two rows with value = 'foo'.

@joop: I added a link to detailed explanation for that one. – Erwin Brandstetter Aug 19 '14 at 12:38 — Erwin Brandstetter, Aug 19 '14 at 12:38

score 0 · Answer 2 · edited Aug 20 '14 at 10:06

The following solution exactly makes what i desired:

create temp table t as
 select
            id,
            value,
            case when value = 'foo'
            and random() <= 0.5
            then 't' else null
            end as to_group

            from

            (select
            generate_series(1,100) as id,
            case when random() <= 0.2 then 'foo' else 'bar' end as value
            )t1;



SELECT array_agg(id) from

    (SELECT *, sum(group_flag) over (ORDER BY id) AS group_nr FROM 

        (select *,
    case WHEN (to_group = 't' and value = 'foo')
    or (next_to_group = 't' and value = 'bar') THEN NULL
    ELSE '1'::integer END AS group_flag
    from(
            select distinct on (id) id, value, to_group, foo_id, next_to_group
            from (
                select * from t
            left join
            (select id as foo_id, to_group as next_to_group from t where value = 'foo')n
            on
            n.foo_id > t.id
            order by id, foo_id
            )t1
    )t2

    )t3
    )t4 group by group_nr

Group two rows and all rows in between on condition

2 Answers2

Explain