1

How can I retrieve a list off nth occurence of data in a clob?

Example of a clob:

<bank>
    <bankDetails>
        <bankDetailsList>
            <pk>1</pk>
            <accountName>
                <asCurrent>EDGARS LESOTHO</asCurrent>
            </accountName>          
            <bankAccountType>
                <asCurrent>CURR</asCurrent>
            </bankAccountType>
        </bankDetailsList>
        <bankDetailsList>
            <pk>2</pk>
            <accountName>
                <asCurrent>EDGARS LESOTHO 2</asCurrent>
            </accountName>
            <bankAccountType>
                <asCurrent>CURR</asCurrent>
            </bankAccountType>
        </bankDetailsList>
    </bankDetails>
</bank>

So I would like to retrieve all values of account names in sql assuming there might be up to nth list of this account names occurring in a clob.

I am using oracle 11g and SqlDeveloper 4.1.3 Your response is highly appreciated.

Chebaz
  • 23
  • 1
  • 8

2 Answers2

1
SELECT EXTRACTVALUE( v.COLUMN_VALUE, '/asCurrent' )
FROM   table_name t,
       TABLE(
         XMLSequence(
           EXTRACT(
             XMLType( t.clob_column ),
             '/bank/bankDetails/bankDetailsList/accountName/asCurrent'
           )
         )
       ) v
MT0
  • 143,790
  • 11
  • 59
  • 117
  • What is v.column_value? – Chebaz Apr 28 '16 at 14:02
  • @Chebaz When you use the `TABLE()` table function it aliases the column it returns to `COLUMN_VALUE` - so `v.COLUMN_VALUE` will contain an `XMLType` element for each part of the XML that the XPath `/bank/bankDetails/bankDetailsList/accountName/asCurrent` has matched. – MT0 Apr 28 '16 at 14:10
  • Thank you, I have got it working. You have really saved my day. – Chebaz Apr 28 '16 at 14:40
  • Thanks, I have made a green tick. @MT0 – Chebaz Jun 01 '16 at 12:27
0
SELECT level as rnk, regexp_substr(t.clob_column, 
          '<accountName>[^<]*?<asCurrent>([^<]*?)<', 1, level, null, 1) as acct_name
FROM t
CONNECT BY level <= (select regexp_count(clob_column, '<accountName>') FROM t);

t is the table name and clob_column is the column with clob values (in my test case, the table has one row and one column, the value being the one in the original post).

If you have a column of clob values and need to do this simultaneously for more than one value, this needs to be modified a bit; please clarify the requirement and we can take it from there.

ADDED: To make it work with several rows, you need to modify the CONNECT BY LEVEL clause. You want each row to only reference itself; and to avoid issues with cycles, you need to add one more condition. Like this:

...
CONNECT BY level <= (select regexp_count(clob_column, '<accountName>') FROM t)
       and clob_column= prior clob_column
       and prior sys_guid() is not null;
  • While regular expressions can work to get specific limited data out of XHTML - it is quite fragile and [does not work as a generic XHTML parser](http://stackoverflow.com/q/1732348/1509264). For example, if the source data is modified so that you have an optional nested element before the one you want, such as `Not ValidEDGARS LESOTHO`, then this will return an incorrect value. – MT0 Apr 28 '16 at 14:20
  • I have a column of clob values and I need to do this simultaneously because I have thousands of row having this clob values. – Chebaz Apr 28 '16 at 14:23
  • Thank you for your prompt response, I have MT0's query above working fine. – Chebaz Apr 28 '16 at 14:42
  • Added code to deal with several values in the CLOB column. You may want to test both solutions and see which is fastest for your data and setup; there are pros and cons to both approaches. –  Apr 28 '16 at 15:16