0

I have a data stored in as HTML table as CLOB type in a table, I had tried to parse it like below with reference from Parse HTML table with Oracle

with tbl as
(
    SELECT ROW_ID,xmltype(SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)) xml_data
    FROM MY_Table qe
    WHERE EVENT='note' 
)
select
    ROW_ID,rtrim (xmlagg (xmlelement (e, x.data
        || '|')
      ORDER BY x.row_number).extract ('//text()'), '|') AS Data   
from
    tbl
cross join
    xmltable('/table/tr'
        passing tbl.xml_data
        columns
            row_number FOR ORDINALITY,
            data varchar2(2000) path 'td[1]'
            
    ) x
WHERE x.data IS NOT NULL
GROUP BY ROW_ID

the above is working for some values but it is breaking giving the error ORA-06502: PL/SQL: numeric or value error ORA-06512: at "SYS.XMLTYPE", line 272 as said in that reference itself We can try parsing an HTML file using a XML parser, but it’s likely to fail. The reason is that HTML documents can have the following HTML features that XML parsers don’t understand. -- https://stackoverflow.com/a/32573619/266304

Is there a HTML parser in Oracle instead of XML Parser, I had tried to search but nothing worked out

Anyone please help

Rajesh
  • 1,600
  • 5
  • 33
  • 59
  • 3
    You can use a Java HTML parser and embed the classes into the database using `CREATE JAVA` commands or the `loadjava` utility. – MT0 Jun 16 '21 at 08:59
  • I have only Read only rights in the database is it possible there ? – Rajesh Jun 16 '21 at 09:54
  • No, you would need to get the DBA to import the Java classes after you have created them if you were using a Java solution. – MT0 Jun 16 '21 at 10:45

1 Answers1

0

the above is working for some values but it is breaking giving the error ORA-06502: PL/SQL: numeric or value error ORA-06512: at "SYS.XMLTYPE", line 272

It will get that error if you have any NOTE values which do not have a <br> tag, because this:

SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)

will then be null, and xmltype() then throw that error.

If - and it's a big if - all of the notes start with a simple table with no embedded problematic tags, and that may or may not be followed be a line break tag, then you can use a case expression to only do the substr when needed:

with tbl as
(
    SELECT ROW_ID,
      xmltype(
        CASE
          WHEN INSTR(qe.NOTE, '<br>') > 0
          THEN SUBSTR(qe.NOTE, 0, INSTR(qe.NOTE, '<br>')-1)
          ELSE qe.NOTE
        END
      ) xml_data
    FROM MY_Table qe
    WHERE EVENT='note' 
)
...

Or perhaps slightly more robustly, look for and extract a table:

with tbl as
(
    SELECT ROW_ID,
      xmltype(SUBSTR(qe.NOTE, INSTR(qe.NOTE, '<table>'), INSTR(qe.NOTE, '</table>') + 7)) xml_data
    FROM MY_Table qe
    WHERE EVENT='note'
    AND INSTR(qe.NOTE, '<table>') > 0
)
...

db<>fiddle

But as you already know, this kind of approach is fraught with problems.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • Thanks Alex it worked but with errors because of special Characters I had replaced the special characters – Rajesh Jun 16 '21 at 11:40