2

I have a query in which I want to group by some fields and aggregate the last field within a CSV string. If you come from SQL Server like me, you would use FOR XML PATH(''). But in Oracle 12c that's a different story :

Table definition

CREATE TABLE HCF (
ID NUMBER,
HCF_DATE DATE,
HCF_TYPE_1 NUMBER,
HCF_TYPE_2 NUMBER)

Sample data

ID  HCF_DATE    HCF_TYPE_1 HCF_TYPE_2
272 27/02/18    1          1
279 28/02/18    15         2
280 28/02/18    15         2
283 28/02/18    5          1

The query I'm using

WITH CTE_HCF AS (
SELECT HCF_DATE, HCF_TYPE_1, HCF_TYPE_2, COUNT(ID)
FROM HCF
GROUP BY HCF_DATE, HCF_TYPE_1, HCF_TYPE_2
HAVING COUNT(ID) > 0
)

SELECT a.*, b.*
FROM CTE_HCF a
CROSS APPLY (
    SELECT LTRIM(MAX(SYS_CONNECT_BY_PATH(ORDRE_ID,',')) KEEP (DENSE_RANK LAST ORDER BY curr),',') AS ids
    FROM ( SELECT HCF_DATE, HCF_TYPE_1, HCF_TYPE_2, ID,
            ROW_NUMBER() OVER (PARTITION BY HCF_DATE ORDER BY HCF_TYPE_1, HCF_TYPE_2) AS curr,
            ROW_NUMBER() OVER (PARTITION BY HCF_DATE ORDER BY HCF_TYPE_1, HCF_TYPE_2) -1 AS prev
            FROM CTE_HCF
            WHERE HCF_DATE = a.HCF_DATE AND HCF_TYPE_1 = a.HCF_TYPE_1 AND HCF_TYPE_2 = a.HCF_TYPE_2
        )
    CONNECT BY prev = PRIOR curr
    AND HCF_DATE = PRIOR HCF_DATE
    AND HCF_TYPE_1 = PRIOR HCF_TYPE_1
    AND HCF_TYPE_2 = PRIOR HCF_TYPE_2
    START WITH curr = 1 ) b

The error

ORA-00904: "a"."HCF_TYPE_2" :  invalid identifier

The desired output

HCF_DATE    HCF_TYPE_1 HCF_TYPE_2 IDS
27/02/18    1          1          272
28/02/18    15         2          279,280
28/02/18    5          1          283

I think that the problem is that fields from the original query are not visible in the sub query of the CROSS APPLY sub query.

PS : I have tried other ways described in this article but I fail for multiple reasons like for LISTAGG it exceeds the xK number of characters. And I don't have sufficient privileges to create a function and XMLAGG might bring down your Oracle instance.

Update Oracle version is : Oracle Database 12c Enterprise Edition Release 12.1.0.2.0

William Robertson
  • 15,273
  • 4
  • 38
  • 44
Fourat
  • 2,366
  • 4
  • 38
  • 53
  • 1
    *"I think that the problem is that fields from the original query are not visible in the sub query"* Yeah, the attributes are only visible one layer down. So you could reference them in the WHERE clause of the inline view `b` but you're trying to reference them in a subquery nested within `b`. – APC Apr 30 '19 at 16:59
  • `XMLAGG` might bring down your instance? – Chris Saxon May 01 '19 at 11:12
  • @APC referencing `HCF_TYPE_1` and `HCF_TYPE_2` in WHERE clause of the inline view `b` returns null. Also `ID` is `NUMBER` and there's no conversion to `CLOB`. Technically my data couldn't blow the max string size because when I calculate it manually the max CSV `IDS` is 2466 characters though the VARCHAR2 maximum is 32767 characters. – Fourat May 02 '19 at 07:42
  • @ChrisSaxon `XMLAGG` consumes a lot of resources, I tested on a data sample and it used the hole available CPU. Check Erik Božič's comment on this answer : https://stackoverflow.com/a/29008456/3759822 – Fourat May 02 '19 at 07:46
  • 1
    When I said *"you could reference them in the WHERE clause of the inline view `b`"* what I meant was it's legal to reference them there, not that it would solve your problem. But then again, if technically your data *"couldn't blow the max string size because when I calculate it manually the max CSV IDS is 2466 characters"* what problem are you trying to solve? This is starting to sound like [an XY question](https://meta.stackexchange.com/q/66377/133829). – APC May 02 '19 at 08:19

2 Answers2

1

If you're blowing out the character limit for listagg, how you handle this depends on what you want to display.

If you're on 12.2, you can use the on overflow truncate clause to trim characters that exceed the limit.

select hcf_date, hcf_type_1, hcf_type_2, 
       listagg ( id, ',' 
         on overflow truncate 
       ) within group ( 
         order by id
       ) csv
from   hcf
group  by hcf_date, hcf_type_1, hcf_type_2;

HCF_DATE               HCF_TYPE_1   HCF_TYPE_2   CSV       
27-FEB-2018 00:00:00                1             1 272        
28-FEB-2018 00:00:00                5             1 283        
28-FEB-2018 00:00:00               15             2 279,280 

If you're on 12.1 there are some other workarounds.

You can use row pattern matching (match_recognize) to find the length of the CSV for each extra row. And return those within the string limit:

I've added a couple of extra rows and set the character limit to 10 to show the principle:

insert into hcf values ( 281, to_date('28/02/18', 'dd/mm/yy'), 15, 2);
insert into hcf values ( 282, to_date('28/02/18', 'dd/mm/yy'), 15, 2);

with grps as ( 
  select *
  from   hcf match_recognize (
    partition by hcf_date, hcf_type_1, hcf_type_2
    order by id
    measures 
      sum(lengthb(s.id) + lengthb(';')) as len
    all rows per match
    after match skip past last row
    pattern (s+)
    define 
      s as 1=1
  )
)
  select hcf_date, hcf_type_1, hcf_type_2,
         listagg ( id, ',' ) 
           within group ( 
             order by id
           ) csv
  from   grps
  where  len <= 10
  group  by hcf_date, hcf_type_1, hcf_type_2;

HCF_DATE               HCF_TYPE_1   HCF_TYPE_2  CSV       
27-FEB-2018 00:00:00            1            1  272        
28-FEB-2018 00:00:00            5            1  283        
28-FEB-2018 00:00:00           15            2  279,280    

Or you could split the rows into separate groups when you hit the character limit. And show them as separate CSVs:

with grps as ( 
  select *
  from   hcf match_recognize (
    partition by hcf_date, hcf_type_1, hcf_type_2
    order by id
    measures 
      match_number() as grp
    all rows per match
    after match skip past last row
    pattern (s csv*)
    define csv as 
      lengthb(s.id) + sum(lengthb(csv.id) + lengthb(';')) < = 10
  )
)
  select hcf_date, hcf_type_1, hcf_type_2,
         listagg ( id, ',' ) 
           within group ( 
             order by id
           ) csv
  from   grps
  group  by hcf_date, hcf_type_1, hcf_type_2, grp;

HCF_DATE               HCF_TYPE_1   HCF_TYPE_2  CSV       
27-FEB-2018 00:00:00            1            1  272        
28-FEB-2018 00:00:00            5            1  283        
28-FEB-2018 00:00:00           15            2  279,280    
28-FEB-2018 00:00:00           15            2  281,282  

If you want to return the whole CSV list longer than the varchar2 limit, you need to return a clob. Which you can do with XML:

select hcf_date, hcf_type_1, hcf_type_2,
       substr (
          xmlcast ( 
            xmlagg (
              xmlelement(s, ',' || id)
              order by id
            ) as clob
          ), 2
        ) csv
from    hcf
group  by hcf_date, hcf_type_1, hcf_type_2;  

HCF_DATE               HCF_TYPE_1   HCF_TYPE_2   CSV               
27-FEB-2018 00:00:00            1             1  272                
28-FEB-2018 00:00:00            5             1  283                
28-FEB-2018 00:00:00           15             2  279,280,281,282 
Chris Saxon
  • 9,105
  • 1
  • 26
  • 42
  • I'm running 12.1 and truncating data is not an option for me, I have to return all the data. So I guess the best option according to your answer is returning a `CLOB` instead of a `varchar2` but how can I do it with `LISTAGG` ? – Fourat May 02 '19 at 07:59
  • You can't. `Listagg` only returns `varchar2` (or `raw`). The limit for PL/SQL varchar2s is 32,767. The SQL limit is 4,000 bytes - unless you've enabled [extended data types](https://oracle-base.com/articles/12c/extended-data-types-12cR1) – Chris Saxon May 02 '19 at 10:15
0

Finally I managed to ask the DBA to create a function that concatenates id values in CSV, which I personally think is the best approach to return a CLOB value.

The function :

create or replace FUNCTION concatenate_list (p_cursor IN  SYS_REFCURSOR)
  RETURN  CLOB
IS
  l_return  CLOB; 
  l_temp    CLOB;
BEGIN
  LOOP
    FETCH p_cursor
    INTO  l_temp;
    EXIT WHEN p_cursor%NOTFOUND;
    l_return := l_return || ',' || l_temp;
  END LOOP;
  RETURN LTRIM(l_return, ',');
END;

The query

WITH CTE_HCF AS (
SELECT HCF_DATE, HCF_TYPE_1, HCF_TYPE_2, COUNT(ID)
FROM HCF
GROUP BY HCF_DATE, HCF_TYPE_1, HCF_TYPE_2
HAVING COUNT(ID) > 0
)

SELECT a.*
  , concatenate_list(CURSOR(SELECT id FROM HCF WHERE HCF_DATE = a.HCF_DATE AND HCF_TYPE_1 = a.HCF_TYPE_1 AND HCF_TYPE_2 = a.HCF_TYPE_2)) AS CSV
FROM CTE_HCF a

Source

PS : If you don't need a CLOB then LISTAGG is your best choice.

Fourat
  • 2,366
  • 4
  • 38
  • 53