1

I had tried to Parse a HTML table in oracle and try to extract the data inside it using the query

with tbl as
(
    select xmltype('<table class="table table-striped " border=''1''>
 <caption>ACO Details</caption>
 <tr><th><b>Record</b></th><th><b>Changed ACO(s)</b></th></tr>
 <tr><td> Added   ACO</td><td>TDE</td></tr></table>
 <br>
 <table style=''width:60%;'' border=''1''>
 <caption>Price Breakdown</caption>
 <tr style=''background-color: darkgray;text-align:right;''>
 <th style=''width:50%;text-align: center;''><b>Price</b></th>
 <th style=''width:30%;text-align: center;''><b>Difference</b></th></tr>
 <tr><td> Base Price</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Extras Price</td><td style=''text-align:right;''>43.98</td></tr>
 <tr><td> Total Discount</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Medical Price</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Postage</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Admin Fee</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Transaction Fee</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Auto Renewal Adjustment</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Stamp Duty</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Competitor Price Matching Discount</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Charge Adjustment</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Total Price</td><td style=''text-align:right;''>43.98</td></tr></table>
 <br/>
 <table style=''width:60%;'' border=''1''>
 <caption>Reason</caption><tr><td>Change Done in Web</td></tr>
 </table>
 <br/>') xml_data from dual
)
select
    x.class, x.caption
from
    tbl
cross join
    xmltable('/table/caption/tr'
        passing tbl.xml_data
        columns
            class varchar2(10) path '@class',
            caption varchar2(3) path 'caption[1]'
    ) x
where
    x.caption is not NULL;

but this throws the error

ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00245: extra data after end of document
Error at line 5
ORA-06512: at "SYS.XMLTYPE", line 310
ORA-06512: at line 1

Can someone please point me what I am doing wrong

Rajesh
  • 1,600
  • 5
  • 33
  • 59
  • 1
    HTML is not XML. this is not valid xml. – OldProgrammer Mar 14 '20 at 20:37
  • @OldProgrammer I had refereed this Answer - https://stackoverflow.com/questions/58076244/parse-html-table-with-oracle and tried the above query in it it parses the HTML table only – Rajesh Mar 14 '20 at 20:40

1 Answers1

2

Your HTML is invalid XML.

  • You have a <br> tag that does not have a matching closing tag; and
  • Your HTML is a fragment that does not have a single root element.

Put a leading <html><body> and training </body></html> and replace <br> with <br /> and the ORA-31011 exception goes away.

with tbl as
(
    select xmltype('<html><body><table class="table table-striped " border=''1''>
 <caption>ACO Details</caption>
 <tr><th><b>Record</b></th><th><b>Changed ACO(s)</b></th></tr>
 <tr><td> Added   ACO</td><td>TDE</td></tr></table>
 <br />
 <table style=''width:60%;'' border=''1''>
 <caption>Price Breakdown</caption>
 <tr style=''background-color: darkgray;text-align:right;''>
 <th style=''width:50%;text-align: center;''><b>Price</b></th>
 <th style=''width:30%;text-align: center;''><b>Difference</b></th></tr>
 <tr><td> Base Price</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Extras Price</td><td style=''text-align:right;''>43.98</td></tr>
 <tr><td> Total Discount</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Medical Price</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Postage</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Admin Fee</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Transaction Fee</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Auto Renewal Adjustment</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Stamp Duty</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Competitor Price Matching Discount</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Charge Adjustment</td><td style=''text-align:right;''>0.00</td></tr>
 <tr><td> Total Price</td><td style=''text-align:right;''>43.98</td></tr></table>
 <br/>
 <table style=''width:60%;'' border=''1''>
 <caption>Reason</caption><tr><td>Change Done in Web</td></tr>
 </table>
 <br/></body></html>') xml_data from dual
)
SELECT x.class,
       x.caption
FROM   tbl
       CROSS JOIN
       XMLTABLE(
         '/html/body/table'
         PASSING tbl.xml_data
         COLUMNS
            class   VARCHAR2(50) PATH './@class',
            caption VARCHAR2(30) PATH './caption/text()'
       ) x;

outputs:

CLASS                | CAPTION        
:------------------- | :--------------
table table-striped  | ACO Details    
null                 | Price Breakdown
null                 | Reason         

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117