DB2 SQL How to split a column into potentially many rows?

Question

I've got a table with 2 fields (ID & Profile).

Table has lots of records, but is unique by ID.
Profile is 1000 characters long, and each 10 characters is a basically a Profile code.
Some ID's may have just 1 code, whilst others have many.

I don't know why the table was designed like this.

So a record might look something like:

ID              PROFILE
BOB             BM        BS        DM        FM        IC        IC6       IL        IM        JN101     MM        XC        XM        XR
BILL            ZZ        XY

Is there a way to split the Profile field, making new rows

So it would look like:

ID              PROFILE
BOB             BM
BOB             BS
BOB             DM
...
BILL            ZZ
BILL            XY

I'd really like to be able to create a View somehow.

Platform and version of Db2 are important to note. With Db2 for IBM i v7.3 and higher, IBM has included a `SPLIT()` [function](https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_74/rzajq/rzajqudfsplit.htm) for you... — Charles, Feb 24 '20 at 17:58
We're using Db2 for IBM i v7.3 - I'll look into SPLIT(). Thanks. — Anson, Feb 24 '20 at 21:25

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

4

Try this:

DB2 for LUW

/*
WITH TAB (ID, PROFILE) AS 
(
VALUES 
  ('BOB',  'BM        BS        DM        FM        IC        IC6       IL        IM        JN101     MM        XC        XM        XR')
, ('BILL', 'ZZ        XY')
)
*/
SELECT T.ID, V.TOK
FROM TAB T
, xmltable
(
'for $id in tokenize($s, " +") return <i>{string($id)}</i>' 
passing T.PROFILE as "s" 
columns 
  tok varchar(10) path '.'
) V;

dbfiddle example.

DB2 for LUW & IBM i

IBM i doesn't support (at least up to 7.4) FLWOR XQuery expressions, so you need to construct an XML document.

SELECT T.ID, V.TOK
FROM 
  TAB T
, XMLTABLE 
(
'$doc/d/i'
PASSING XMLPARSE(DOCUMENT '<d><i>' || regexp_replace(trim(T.PROFILE), '  +', '</i><i>') || '</i></d>') as "doc"
COLUMNS 
  TOK VARCHAR(10) PATH '.'
) V;

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 24 '20 at 07:17

Mark Barinstein

11,456
2
8
16

I copied and pasted that but got an error. "An XPath expression has an unexpected token string($id) following {." I wouldn't know how to fix this. – Anson Feb 24 '20 at 08:48
What’s your Db2 platform and version? – Mark Barinstein Feb 24 '20 at 09:47
Db2 for IBM i v7.3 – Anson Feb 24 '20 at 21:25
@Anson You should tag your question properly. I've added the `db2-400` tag to your question and updated the answer. – Mark Barinstein Feb 25 '20 at 13:25
Thanks @Mark. That works great. I didn't know about all the available tags so thanks for fixing that. – Anson Feb 25 '20 at 21:15

score 1 · Answer 2 · answered Feb 24 '20 at 23:11

Tried using split(). It seems very slow. Below takes 14.817 sec.

WITH TAB (ID, PROFILE) AS 
(
VALUES 
  ('BOB',  cast('BM        BS        DM        FM        IC        IC6       IL        IM        JN101     MM        XC        XM        XR' as char(1000))), 
  ('BILL', cast('ZZ        XY' as char(1000)))
)
SELECT a.id, b.element as profile
from  tab a, 
      TABLE (systools.split(a.profile , ' ')) b
where b.element <> '';

Compared with Joachim's answer taking only 134 ms!

WITH TAB (ID, PROFILE) AS 
(
VALUES 
  ('BOB',  cast('BM        BS        DM        FM        IC        IC6       IL        IM        JN101     MM        XC        XM        XR' as char(1000))), 
  ('BILL', cast('ZZ        XY' as char(1000)))
)
SELECT a.id, b.item as PROFILE
  from tab a
      ,XMLTABLE('$doc/items/item'
          PASSING XMLPARSE(DOCUMENT CAST(   '<items><item>'
                                         || replace(a.profile , ' ' , '</item><item>')
                                         || '</item></items>' as CLOB
                                        )
                           ) as "doc"
          COLUMNS
          ITEM VARCHAR(255) PATH '.'
      ) b 
WHERE b.item <> '';

T.Rex · Answer 3 · 2020-10-22T15:39:26.900

There is the SYSTOOLS.SPLIT function but she's behaving very strangely... I don't know what's going on... Very slow and buggy with long string as parameter !?!

CHECK> SELECT * FROM TABLE (SYSTOOLS.SPLIT('4182511-S_4182701-X_4182702-X_4182703-X_4182796-S' , '_'))
[2020-10-22 17:04:59] 0 rows retrieved in 86 ms (execution: 51 ms, fetching: 35 ms)
[2020-10-22 17:04:59] [0F001][-423] [SQL0423] Pointeur SQLP_L2.VWORK incorrect.
[2020-10-22 17:04:59] [0F001][-423] [SQL0423] Pointeur SQLP_L2.VWORK incorrect.
[2020-10-22 17:30:29] 5 rows retrieved starting from 1 in 255 ms (execution: 41 ms, fetching: 214 ms)
CHECK> SELECT * FROM TABLE (SYSTOOLS.SPLIT('4182511-S_4182701-X_4182702-X_4182703-X_4182796-S' , '_'))
[2020-10-22 17:30:29] 5 rows retrieved starting from 1 in 251 ms (execution: 39 ms, fetching: 212 ms)
CHECK> SELECT * FROM TABLE (SYSTOOLS.SPLIT('4182511-S_4182701-X_4182702-X_4182703-X_4182796-S' , '_'))
[2020-10-22 17:30:30] 5 rows retrieved starting from 1 in 253 ms (execution: 39 ms, fetching: 214 ms)
CHECK> SELECT * FROM TABLE (SYSTOOLS.SPLIT('4182511-S_4182701-X_4182702-X_4182703-X_4182796-S' , '_'))

DB2 for IBM i

Here is my custom SPLIT function from this one

CREATE OR REPLACE FUNCTION UTILS.Split (String VARCHAR(32000), SEP VARCHAR(5))
RETURNS TABLE (ID INT, VALUE VARCHAR(256))
  LANGUAGE SQL
  DISALLOW PARALLEL
  DETERMINISTIC
  NOT FENCED
  RETURN

  WITH CTE (ID,StartString,StopString) AS
  (
    SELECT 1 AS ID, 1 AS StartString, LOCATE(SEP, String) AS StopString
    FROM SYSIBM.SYSDUMMY1
    WHERE LENGTH(SEP) > 0 AND LENGTH(String) > 0
    UNION ALL
    SELECT ID + 1, StopString + LENGTH(SEP), LOCATE(SEP, String, StopString + LENGTH(SEP))
    FROM CTE
    WHERE StopString > 0
  )
  SELECT
    ID,
    SUBSTRING(String,StartString,
      CASE WHEN StopString = 0
      THEN LENGTH(String)
      ELSE StopString-StartString END
    )
  FROM CTE;

CHECK> SELECT * FROM TABLE (UTILS.SPLIT('4182511-S_4182701-X_4182702-X_4182703-X_4182796-S' , '_'))
[2020-10-22 17:30:32] 5 rows retrieved starting from 1 in 50 ms (execution: 36 ms, fetching: 14 ms)
CHECK> SELECT * FROM TABLE (UTILS.SPLIT('4182511-S_4182701-X_4182702-X_4182703-X_4182796-S' , '_'))
[2020-10-22 17:30:33] 5 rows retrieved starting from 1 in 63 ms (execution: 38 ms, fetching: 25 ms)

DB2 SQL How to split a column into potentially many rows?

3 Answers3

DB2 for LUW

DB2 for LUW & IBM i

DB2 for IBM i

Linked