-1

BD: Oracle 11g r2

A great challenge is being for me.

How can I return a specific value within the query (html) with output at: dbms_output

DBMS_Output

...
<td colspan = "5"> <b> <h3> <center> Basic Consultation of the ICMS Registry of Bahia </center> </h3> </b> </td>
...

I would like to return only the amount below.

Expected result:

Consulta Básica ao Cadastro do ICMS da Bahia

Como retornar somente este valor na consulta?


Query PLSQL / Mateus Ribeiro.sql

declare
req utl_http.req;
res utl_http.resp;
url varchar2(4000) := 'http://www.sefaz.ba.gov.br/scripts/cadastro/cadastroBa/result.asp';
name varchar2(4000);
DADOS varchar2(4000); 
content varchar2(4000) := 'CGC=13504675000110&B1=CNPJ++-%3E&CPF=&IE=';

begin

req := utl_http.begin_request(url, 'POST',' HTTP/1.1');
--utl_http.set_header(req, 'user-agent', 'mozilla/4.0'); 
utl_http.set_header(req, 'content-type', 'application/x-www-form-urlencoded'); 
utl_http.set_header(req, 'Content-Length', length(content));

utl_http.write_text(req, content);

res := utl_http.get_response(req);

    begin
    loop
    utl_http.read_text(res, DADOS);
    dbms_output.put_line(DADOS);

    end loop;
    utl_http.end_response(res); 
    exception
    when utl_http.end_of_body then
    utl_http.end_response(res);
    end;
end;
Littlefoot
  • 131,892
  • 15
  • 35
  • 57
  • 1
    The requirement should be clearly formulated: how can you describe this exact part of the page? What if markup change tomorrow or someone decide to take control over the content and replace legacy tags with CSS? Your code will be invalid. – astentx Apr 20 '21 at 20:29
  • I am dedicated to maintenance if there is a change in the page. this page is stable, there is hardly any change. – Mateus Ribeiro Apr 20 '21 at 20:36
  • Is the issue with parsing the line of HTML in your first example, or reading the page in your screenshot to extract that line, or retrieving the page itself from a web address, or something to do with dbms_output? Surely once you have the desired string, you won't have any problem printing it in a debug message. – William Robertson Apr 20 '21 at 22:42

2 Answers2

0

If you managed to extract that string, then you could apply SUBSTR + INSTR combination to it, e.g.

SQL> with test (col) as
  2    (select '<td colspan = "5"> <b> <h3> <center> Basic Consultation of the ICMS Registry of Bahia </center> </h3> </b> </td>' from dual)
  3  select substr(col, instr(col, '<center>') + 9,
  4                     instr(col, '</center>') - instr(col, '<center>') - 10
  5               ) result
  6  from test;

RESULT
------------------------------------------------
Basic Consultation of the ICMS Registry of Bahia

SQL>
Littlefoot
  • 131,892
  • 15
  • 35
  • 57
0

Your document is HTML and it does not conform to the XML specification. Oracle libraries can parse an HTML document only if it is a valid XML.

With this code you could try to parse your document, but it fails.

declare
req utl_http.req;
res utl_http.resp;
url varchar2(4000) := 'http://www.sefaz.ba.gov.br/scripts/cadastro/cadastroBa/result.asp';
name varchar2(4000);
DADOS varchar2(4000); 
content varchar2(4000) := 'CGC=13504675000110&B1=CNPJ++-%3E&CPF=&IE=';
document_clob clob;
result_text varchar2(1000);
begin

req := utl_http.begin_request(url, 'POST',' HTTP/1.1');
--utl_http.set_header(req, 'user-agent', 'mozilla/4.0'); 
utl_http.set_header(req, 'content-type', 'application/x-www-form-urlencoded'); 
utl_http.set_header(req, 'Content-Length', length(content));

utl_http.write_text(req, content);

res := utl_http.get_response(req);

dbms_lob.createtemporary(document_clob, true);

    begin
    loop
    utl_http.read_text(res, DADOS);
    dbms_output.put_line(DADOS);
    dbms_lob.append(document_clob, DADOS);

    end loop;
    utl_http.end_response(res); 
    exception
    when utl_http.end_of_body then
    utl_http.end_response(res);
    end;
    
    select EXTRACT(xmltype(document_clob), '//table[id="Table4"]/tr/td/b/h3/center').getStringVal() into result_text from dual;
    
    dbms_output.put_line(result_text);
end;
/

It produces the following error message:

LPX-00225: end-element tag "head" does not match start-element tag "link"

In PL/SQL it is possible to parse HTML with regexps, but see this question: RegEx match open tags except XHTML self-contained tags

maksymsan
  • 66
  • 5
  • It is possible to convert HTML to XHTML with command line utilities. This might require installing a software on the database server. It also requires using packages like DBMS_SCHEDULER and might require accessing a server's file system. XHTML is parsable by Oracle's XML functions. – maksymsan Apr 20 '21 at 22:24
  • xmllint can extract a specific text from this document by using XPath. So in this case there is not much to do in PL/SQL - you just read the result from xmllint. – maksymsan Apr 20 '21 at 22:30