9

I need something similar to these 2 SO questions, but using Informix SQL syntax.

My data coming in looks like this:

id     codes

63592  PELL
58640  SUBL
58640  USBL
73571  PELL
73571  USBL
73571  SUBL

I want to see it come back like this:

id     codes 

63592  PELL
58640  SUBL, USBL
73571  PELL, USBL, SUBL

See also group_concat() in Informix.

Community
  • 1
  • 1
CheeseConQueso
  • 5,831
  • 29
  • 93
  • 126

7 Answers7

28

I believe that the answer you need is a user-defined aggregate, similar to this one:

CREATE FUNCTION gc_init(dummy VARCHAR(255)) RETURNING LVARCHAR;
    RETURN '';
END FUNCTION;

CREATE FUNCTION gc_iter(result LVARCHAR, value VARCHAR(255))
    RETURNING LVARCHAR;
    IF result = '' THEN
        RETURN TRIM(value);
    ELSE
        RETURN result || ',' || TRIM(value);
    END IF;
END FUNCTION;

CREATE FUNCTION gc_comb(partial1 LVARCHAR, partial2 LVARCHAR)
    RETURNING LVARCHAR;
    IF partial1 IS NULL OR partial1 = '' THEN
        RETURN partial2;
    ELIF partial2 IS NULL OR partial2 = '' THEN
        RETURN partial1;
    ELSE
        RETURN partial1 || ',' || partial2;
    END IF;
END FUNCTION;

CREATE FUNCTION gc_fini(final LVARCHAR) RETURNING LVARCHAR;
    RETURN final;
END FUNCTION;

CREATE AGGREGATE group_concat
    WITH (INIT = gc_init, ITER = gc_iter,
          COMBINE = gc_comb, FINAL = gc_fini);

Given a table of elements (called elements) with a column called name containing (funnily enough) the element name, and another column called atomic_number, this query produces this result:

SELECT group_concat(name) FROM elements WHERE atomic_number < 10;

Hydrogen,Helium,Lithium,Beryllium,Boron,Carbon,Nitrogen,Oxygen,Fluorine

Applied to the question, you should obtain the answer you need from:

SELECT id, group_concat(codes)
    FROM anonymous_table
    GROUP BY id;

CREATE TEMP TABLE anonymous_table
(
    id      INTEGER NOT NULL,
    codes   CHAR(4) NOT NULL,
    PRIMARY KEY (id, codes)
);

INSERT INTO anonymous_table VALUES(63592, 'PELL');
INSERT INTO anonymous_table VALUES(58640, 'SUBL');
INSERT INTO anonymous_table VALUES(58640, 'USBL');
INSERT INTO anonymous_table VALUES(73571, 'PELL');
INSERT INTO anonymous_table VALUES(73571, 'USBL');
INSERT INTO anonymous_table VALUES(73571, 'SUBL');
INSERT INTO anonymous_table VALUES(73572, 'USBL');
INSERT INTO anonymous_table VALUES(73572, 'PELL');
INSERT INTO anonymous_table VALUES(73572, 'SUBL');

SELECT id, group_concat(codes)
    FROM anonymous_table
    GROUP BY id
    ORDER BY id;

The output from that is:

58640 SUBL,USBL
63592 PELL
73571 PELL,SUBL,USBL
73572 PELL,SUBL,USBL

The extra set of data was added to test whether insert sequence affected the result; it appears not to do so (the codes are in sorted order; I'm not sure whether there's a way to alter - reverse - that order).


Notes:

  1. This aggregate should be usable for any type that can be converted to VARCHAR(255), which means any numeric or temporal type. Long CHAR columns and blob types (BYTE, TEXT, BLOB, CLOB) are not handled.
  2. The plain LVARCHAR limits the aggregate size to 2048 bytes. If you think you need longer lengths, specify LVARCHAR(10240) (for 10 KiB), for example.
  3. As of Informix 12.10.FC5, the maximum length that works seems to be 16380; anything longer seems to trigger SQL -528: Maximum output rowsize (32767) exceeded, which surprises me.
  4. If you need to remove the aggregate, you can use:

    DROP AGGREGATE IF EXISTS group_concat;
    DROP FUNCTION IF EXISTS gc_fini;
    DROP FUNCTION IF EXISTS gc_init;
    DROP FUNCTION IF EXISTS gc_iter;
    DROP FUNCTION IF EXISTS gc_comb;
    
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Had cause to use this in the real world, and found that the order of the concatenated values is not consistent. In other words it can't be used reliably to build a histogram of combinations. With a sufficiently large dataset, you will get a mixture of PELL,USBL,SUBL codes and PELL,SUBL,USBL codes. – RET Feb 11 '10 at 05:08
  • On reflection, this is a limitation in the COMBINE function. You would need to break the string back to an array, sort it and join it back together again. That's trivial in perl, but pretty dark magic in SPL... – RET Feb 11 '10 at 05:46
  • @RET: Thanks for the heads up. I agree that sorting in SPL is a non-trivial exercise (and that is being polite). And it is interesting to know that the ordering I saw was just an artefact of the data set. I did try a fair number of twists - but I'm not surprised to hear that the ordering was still coincidental. – Jonathan Leffler Feb 11 '10 at 06:11
  • 2 years ago and still relevant. Served my purpose exactly though, thanks for sharing this - little chance I would have figured this out from the manual. – Trent Apr 03 '13 at 16:37
  • @RET wouldn't you be able to sort with `order by id, codes` ? or `order by id, case codes when 'PELL' then 1 when 'USBL' then 2 else 3 end;` ? You don't have to do everything with aggregate functions – nurettin Jun 26 '15 at 06:14
  • 1
    @nurettin: I don't think so. If you try `SELECT id, group_concat(codes) FROM anonymous_table GROUP BY id ORDER BY id, codes` you get `-19828: ORDER BY column or expression must be in SELECT list in this context`. If you add `codes` to the GROUP BY, you get one row in each group. If you try to add `codes` to the select-list, you get told `-294: The column (codes) must be in the GROUP BY list`, and if you fix that, you get the wrong result again. So, overall, the answer is No. – Jonathan Leffler Jun 26 '15 at 06:21
  • This topic was very very useful to me, and to give my two cents, when I used the function I changed the gc_comb function to this: CREATE FUNCTION gc_comb(partial1 LVARCHAR, partial2 LVARCHAR) RETURNING LVARCHAR; IF (partial1 <> '' and partial2<>'') THEN RETURN partial1 || ',' || partial2; ELIF (partial1<>'' and partial2='') THEN RETURN partial1; ELIF (partial2<>'' and partial1='') THEN RETURN partial2; ELSE RETURN ''; END IF; END FUNCTION; This is just to treat cases when appear blank/null values and not return a serie of blanks separated like this ,,,,, – Carlos Lacerda Dec 28 '15 at 16:50
  • @Carlos: Thanks. So you are dealing with the cases where either `partial1` or `partial2` or both are empty strings, or are you concerned about NULL, or both? If a NULL was encountered, then the `||` operator would generate a null. For the other cases, if one or the other branch of multiple parallel branches generated an empty result, I suppose `gc_comb` could be called upon to combine one or two empty results. To some extent, that's a variation of the problem identified by RET in an earlier [comment](https://stackoverflow.com/questions/715350/C/716655?noredirect=1#comment2199455_716655). – Jonathan Leffler Dec 28 '15 at 16:56
  • @Leffler: you are right this not solve the problem mentioned by RET, in my case is just to avoid a serie of ",,,," without values, perhaps the ordering problem can be solved using a ordered temp table before use of GROUP_CONCAT but I dont know is just a kick :P Thank you. – Carlos Lacerda Dec 28 '15 at 21:36
  • 1
    @CarlosLacerda: I've updated the answer with a slightly streamlined version of the code you outlined in your comment. – Jonathan Leffler Jan 03 '16 at 23:07
  • @JonathanLeffler your last update on group_concat aggregate causes it to return lvarchar which needs to be trimmed and casted to varchar when being concatenated with other strings. That's sometimes useful, too. So I made two versions: group_concat and group_concat_l . – nurettin Apr 06 '16 at 13:12
  • @JonathanLeffler Question [SO 42723709](http://stackoverflow.com/questions/42723709/informix-one-to-many-format-issue) led me here and found your example for an user defined aggregate. Could the issue of the order of the aggregate be resolved using a `collection` to store the intermediate values? – Luís Marques Mar 13 '17 at 00:18
  • @LuísMarques: maybe, but you'd have to think carefully about the `gc_comb()` operation; it probably becomes a 'merge' rather than simple concatenation, and ditto `gc_fini()`. I'm not sure which collection would work best in this context — and that might have an impact on the cost of merging. By all means go ahead and write such a group concatenate, but I fear it won't be trivial. – Jonathan Leffler Mar 13 '17 at 01:50
  • @JonathanLeffler I posted what i came up with, using `SET`. It is slower as expected, but it also consumes a lot more memory (2 order of magnitude higher on my tests). So maybe not really practical. – Luís Marques Mar 14 '17 at 17:53
  • @JonathanLeffler Added an alternative implementation of the aggregate, using `BSON`, for your amusement and horror. – Luís Marques Mar 16 '17 at 13:51
  • The INIT and FINAL functions do nothing here and could be left out, no? – MPelletier Apr 25 '22 at 14:58
  • 1
    @MPelletier — You might be able to leave out the final function, but I'm not sure whether the string would be initialized to empty (rather than, say, NULL) if the init function was left out. Being a C programmer, I don't rely on things being initialized. I'm not sure whether you can omit unused functions in the aggregate definition — but it is quite probable that you can. It's not an area I've experimented with — I prefer not to live (that) dangerously. – Jonathan Leffler Apr 25 '22 at 15:04
  • And there it is, in our (Linz) source and linked to this article since 2015, which is why IBM don't document `GROUP_CONCAT` . – Rich Jul 23 '23 at 23:56
2

Oracle provides list aggregator function for such requirement.

SELECT id, LISTAGG(codes,',') as CODE_LIST FROM <TABLE> GROUP BY id

Output will be like

ID     CODE_LIST 
63592  PELL
58640  SUBL,USBL
73571  PELL,USBL,SUBL
2

I'm not sure about informix sql, but in MSSQL or Oracle, you could do this with the

DECODE or CASE keywords, by concatenating them together. However, this would require you to know all the potential values ahead of time, which is brittle.

I'm assuming the reason you don't like the STUFF keyword is because informix does not support it?

Oracle also supports the CONNECT BY keywords, which would work, but again may not be supported by informix.

Probably the best answer would be to build this output in your client/data layer, after the query. Is there a particular reason why this must be done in the query?

Jason Coyne
  • 6,509
  • 8
  • 40
  • 70
1

Building on Jonathan Leffler example and on RET comments about the ordering of the concatenated values, using Informix 12.10FC8DE, I came up with the following user aggregate:

CREATE FUNCTION mgc_init
(
    dummy VARCHAR(255)
)
RETURNING
    SET(LVARCHAR(2048) NOT NULL);

    RETURN SET{}::SET(LVARCHAR(2048) NOT NULL);

END FUNCTION;

CREATE FUNCTION mgc_iter
(
    p_result SET(LVARCHAR(2048) NOT NULL)
    , p_value VARCHAR(255)
)
RETURNING
    SET(LVARCHAR(2048) NOT NULL);

    IF p_value IS NOT NULL THEN
        INSERT INTO TABLE(p_result) VALUES (TRIM(p_value));
    END IF;

    RETURN p_result;

END FUNCTION;

CREATE FUNCTION mgc_comb
(
    p_partial1 SET(LVARCHAR(2048) NOT NULL)
    , p_partial2 SET(LVARCHAR(2048) NOT NULL)
)
RETURNING
    SET(LVARCHAR(2048) NOT NULL);

    INSERT INTO TABLE(p_partial1)
        SELECT vc1 FROM TABLE(p_partial2)(vc1);

    RETURN p_partial1;

END FUNCTION;

CREATE FUNCTION mgc_fini
(
    p_final SET(LVARCHAR(2048) NOT NULL)
)
RETURNING
    LVARCHAR;

    DEFINE l_str LVARCHAR(2048);
    DEFINE l_value LVARCHAR(2048);

    LET l_str = NULL;

    FOREACH SELECT vvalue1 INTO l_value FROM TABLE(p_final) AS vt1(vvalue1) ORDER BY vvalue1
        IF l_str IS NULL THEN
            LET l_str = l_value;
        ELSE
            LET l_str = l_str || ',' || l_value;
        END IF;
    END FOREACH;

    RETURN l_str;

END FUNCTION;
GRANT EXECUTE ON mgc_fini TO PUBLIC;

CREATE AGGREGATE m_group_concat
WITH
(
    INIT = mgc_init
    , ITER = mgc_iter
    , COMBINE = mgc_comb
    , FINAL = mgc_fini
);

The concatenated values will have no duplicates and will be ordered.

I used Informix collections, namely SET that does not allow duplicate values, to try to keep the code somewhat simple.

The method is to use SET's to keep the intermediate results (and eliminating the duplicates) and at the end build the concatenated string from the ordered values of the final SET.

The use of LVARCHAR for the SET elements is due to the fact that initially i was using VARCHAR but the memory consumption was very, very high. The documentation hints that internally Informix may be casting the VARCHAR to CHAR. I made the change and it did in fact lower the memory consumption (but it is still high).

However, this aggregate memory consumption is around 2 orders of magnitude higher than Jonathan's and about 2 times slower on the tests i conducted (using a table with around 300 000 rows).

So use with care. It consumes a lot of memory and it is not extensively tested ( it may be leaking memory somewhere ).

EDIT 1:

My previous code must be leaking a memory structure somewhere (or internally Informix keeps the collection derived tables around, and it can generated a lot of those).

So, still trying to avoid having to code the aggregate function in C , here is another alternative, using Informix BSON built in functions, that will use much less memory and be a bit faster.

CREATE FUNCTION m2gc_init
(
    dummy VARCHAR(255)
)
RETURNING
    BSON;

    RETURN '{"terms":[]}'::JSON::BSON;

END FUNCTION;

CREATE FUNCTION m2gc_iter
(
    p_result BSON
    , p_value VARCHAR(255)
)
RETURNING
    BSON;

    DEFINE l_add_array_element LVARCHAR(2048);

    IF p_value IS NOT NULL THEN
        LET l_add_array_element = '{ $addToSet: { terms: "' || TRIM(p_value) || '" } }';
        LET p_result = BSON_UPDATE(p_result, l_add_array_element);
    END IF;

    RETURN p_result;

END FUNCTION;

CREATE FUNCTION m2gc_comb
(
    p_partial1 BSON
    , p_partial2 BSON
)
RETURNING
    BSON;

    DEFINE l_array_elements LVARCHAR(2048);
    DEFINE l_an_element LVARCHAR(2048);
    DEFINE l_guard INTEGER;

    LET l_array_elements = NULL;
    LET l_guard = BSON_SIZE(p_partial2, 'terms.0');

    IF l_guard > 0 THEN
        WHILE l_guard > 0
            LET l_an_element = BSON_VALUE_LVARCHAR(p_partial2, 'terms.0');
            IF l_array_elements IS NULL THEN
                LET l_array_elements = '"' || l_an_element || '"';
            ELSE
                LET l_array_elements = l_array_elements || ', "' || l_an_element || '"';
            END IF;
            LET p_partial2 = BSON_UPDATE(p_partial2, '{ $pop: { terms: -1 } }');
            LET l_guard = BSON_SIZE(p_partial2, 'terms.0');
        END WHILE;
        LET l_array_elements = '{ $addToSet: { terms: { $each: [ ' || l_array_elements || ' ] } } }';        
        LET p_partial1 = BSON_UPDATE(p_partial1, l_array_elements);
    END IF;

    RETURN p_partial1;

END FUNCTION;


CREATE FUNCTION m2gc_fini
(
    p_final BSON
)
RETURNING
    LVARCHAR;

    DEFINE l_str_agg LVARCHAR(2048);
    DEFINE l_an_element LVARCHAR(2048);
    DEFINE l_iter_int INTEGER;
    DEFINE l_guard INTEGER;

    LET l_str_agg = NULL;
    LET l_guard = BSON_SIZE(p_final, 'terms.0');

    IF l_guard > 0 THEN
        LET p_final = BSON_UPDATE(p_final, '{ $push: { terms: { $each: [], $sort: 1 } } }');    
        LET l_iter_int = 0;
        WHILE l_guard > 0
            LET l_an_element = BSON_VALUE_LVARCHAR(p_final, 'terms.' || l_iter_int);
            IF l_str_agg IS NULL THEN
                LET l_str_agg = TRIM(l_an_element);
            ELSE
                LET l_str_agg = l_str_agg || ',' || TRIM(l_an_element);
            END IF;
            LET l_iter_int = l_iter_int + 1;
            LET l_guard = BSON_SIZE(p_final, 'terms.' || l_iter_int);
        END WHILE;
    END IF;
    RETURN l_str_agg;

END FUNCTION;

CREATE AGGREGATE m2_group_concat
WITH
(
    INIT = m2gc_init
    , ITER = m2gc_iter
    , COMBINE = m2gc_comb
    , FINAL = m2gc_fini
)
;

The aggregated return value will be ordered and without duplicates.

Again, this was not properly tested. It is a just a POC.

One of the problems is that it is not sanitizing the input values. Some of the BSON manipulating functions receive parameters that are being built by concatenating strings and non escaped characters can break those parameters. For example, a string value with quotes on it: 'I"BrokeIt') can provoke an assortment of errors (Assert Failures included).

And i am certain there are other problems.

However, memory consumption of this implementation is in the same order of magnitude as in Jonathan's example and around 60% slower (again, only very rudimentary testing was performed).

Luís Marques
  • 1,381
  • 1
  • 7
  • 10
1

Also, if informix allows you to create user-functions, you could create a function that returned a string with the concatenated value.

Jason Coyne
  • 6,509
  • 8
  • 40
  • 70
0

This solution support a separator specifiy by the user:

Example:

SELECT id, group_concat(codes, '+')
    FROM anonymous_table
    GROUP BY id
    ORDER BY id;

This is the same solution as Jonathan Leffler, but with support a separator. The default separator if not specify is the ','.

Code:

CREATE ROW TYPE group_concat_t
(
   result   LVARCHAR,
   separator VARCHAR(1)
);

CREATE FUNCTION gc_init(dummy VARCHAR(255), separator VARCHAR(1) default ',' ) RETURNING group_concat_t;
    RETURN ROW('',separator)::group_concat_t;
END FUNCTION; 

CREATE FUNCTION gc_iter(result group_concat_t, value VARCHAR(255))
    RETURNING group_concat_t;
    IF result.result = '' THEN
        RETURN ROW(TRIM(value),result.separator)::group_concat_t ;
    ELSE
        RETURN ROW(result.result || result.separator || TRIM(value),result.separator)::group_concat_t;
    END IF;
END FUNCTION;

CREATE FUNCTION gc_comb(partial1 group_concat_t, partial2 group_concat_t)
    RETURNING group_concat_t;
    IF partial1 IS NULL OR partial1.result = '' THEN
        RETURN partial2;
    ELIF partial2 IS NULL OR partial2.result = '' THEN
        RETURN partial1;
    ELSE
        RETURN ROW(partial1.result || partial1.separator || partial2.result, partial1.separator)::group_concat_t;
    END IF;
END FUNCTION;

CREATE FUNCTION gc_fini(final group_concat_t) RETURNING LVARCHAR;
    RETURN final.result;
END FUNCTION;  

CREATE AGGREGATE group_concat
    WITH (INIT = gc_init, ITER = gc_iter,
          COMBINE = gc_comb, FINAL = gc_fini);
darkzangel
  • 939
  • 9
  • 17
0

I'd like to point you this answer on another similar question on Stack Overflow. You're looking for something like MySQL's group_concat() function.

Community
  • 1
  • 1
Keith Gaughan
  • 21,367
  • 3
  • 32
  • 30