SQL group table by "leading rows" without pl/sql

Question

I have this table (short example) with two columns

1 a
2 a
3 a3
4 a
5 a
6 a6
7 a
8 a8
9 a

and I would like to group/partition them into groups separated by those leading "a", ideally to add another column like this, so I can address those groups easily.

problem is that setup of the table is dynamic so I can't use staticly lag or lead functions, any ideas how to do this without pl/sql in postgres version 9.5

Since PL/SQL is the *Oracle* procedural language, you'll find it surprisingly simple to keep it out of your Postgres query. You meant PL/pgSQL? — Erwin Brandstetter, Jun 29 '16 at 16:48
yes I ment PL/pgSQL, I am just so used to pl/sql when addressing procedural language for sql — Baker, Jun 30 '16 at 08:08

Erwin Brandstetter · Answer 1 · 2022-02-06T01:08:29.047

Assuming the leading part is a single character. Hence the expression right(data, -1) works to extract the group name. Adapt to your actual prefix.

The solution uses two window functions, which can't be nested. So we need a subquery or a CTE.

SELECT id, data
     , COALESCE(first_value(grp) OVER (PARTITION BY grp_nr ORDER BY id), '0') AS grp
FROM (
   SELECT *, NULLIF(right(data, -1), '') AS grp
        , count(NULLIF(right(data, -1), '')) OVER (ORDER BY id) AS grp_nr
   FROM   tbl
   ) sub;

Produces your desired result exactly.

NULLIF(right(data, -1), '') to get the effective group name or NULL if none.

count() only counts non-null values, so we get a higher count for every new group in the subquery.

In the outer query, we take the first grp value per grp_nr as group name and default to '0' with COALESCE for the first group without name (which has a NULL as group name so far).

We could use min() or max() as outer window function as well, since there is only one non-null value per partition anyway. first_value() is probably cheapest since the rows are sorted already.

Note the group name grp is data type text. You may want to cast to integer, if those are clean (and reliably) integer numbers.

score 1 · Accepted Answer · answered Jun 29 '16 at 16:33

1

This can be achieved by setting rows containing a to a specific value and all the other rows to a different value. Then use a cumulative sum to get the desired number for the rows. The group number is set to the next number when a new value in the val column is encountered and all the proceeding rows with a will have the same group number as the one before and this continues.

I assume that you would need a distinct number for each group and the number doesn't matter.

select id, val, sum(ex) over(order by id) cm_sum
from (select t.*
      ,case when val = 'a' then 0 else 1 end ex
      from t) x

The result for the query above with the data in question, would be

id  val cm_sum
--------------
1   a   0
2   a   0
3   a3  1
4   a   1
5   a   1
6   a6  2
7   a   2
8   a8  3
9   a   3

answered Jun 29 '16 at 16:33

Vamsi Prabhala

48,685
4
36
58

Thanks, your is really simple ........ and actually there is no need for that sub query and that case can be put directly into window function sum – Baker Jun 30 '16 at 08:31
@Baker: Simple. And incorrect. The question didn't ask for a serial number, but for the actual string (or number) contained in the column. – Erwin Brandstetter Jun 30 '16 at 14:44
Yes @Erwin Brandstetter, you are right, your answer is 100% correct answer to my question by it's definition, ... however this solution serve the purpose I needed, because it's offering a way to separate those groups – Baker Jul 04 '16 at 08:54

score 0 · Answer 3 · answered Jun 29 '16 at 15:54

With the given data, you can use a cumulative max:

select . . .,
       coalesce(max(substr(col2, 2)) over (order by col1), 0)

If you don't strictly want the maximum, then it gets a bit more difficult. The ANSI solution is to use the IGNORE NULLs option on LAG(). However, Postgres does not (yet) support that. An alternative is:

select . . ., coalesce(substr(reft.col2, 2), 0)
from (select . . .,
             max(case when col2 like 'a_%' then col1 end) over (order by col1) as ref_col1
      from t
     ) tt join
     t reft
     on tt.ref_col1 = reft.col1

score 0 · Answer 4 · answered Jun 29 '16 at 17:43

You can also try this :

 with mytable as (select split_part(t,' ',1)::integer id,split_part(t,' ',2) myvalue 
       from (select unnest(string_to_array($$1 a;2 a;3 a3;4 a;5 a;6 a6;7 a;8 a8;9 a$$,
    ';'))t) a)

  select id,myvalue,myresult from mytable join (
     select COALESCE(NULLIF(substr(myvalue,2),''),'0') myresult,idmin id_down
            ,COALESCE(lead(idmin) over (order by myvalue),999999999999) id_up 
   from (
     select myvalue,min(id) idmin from mytable group by 1
    ) a) b 
  on id between id_down and id_up-1

SQL group table by "leading rows" without pl/sql

4 Answers4

Linked