1

I have a text field that contains an HTML table in a PostgreSQL database. I would like to extract some data from this field: each TR (not from the header) in a row and each TD in a column. Is that possible?

The name of a table is "documentos" and the name of the text field that contains the HTML table is called "props". Props contains following:

select props
from documentos
where uidref = 'ee41f201-0049-41e9-9c5d-5c35e2cf73ac'

content of props field

And I would like obtain:

444444444 | Investigador | Daniel | Perez
555555555 | Becario      | Jorge  | Fernandez

Thanks in advance!

Andrew
  • 7,602
  • 2
  • 34
  • 42
  • Possible duplicate of [How do I parse XML in Python?](https://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python) – PaulMcG Oct 26 '18 at 19:36
  • Maybe [this question could be of help](https://stackoverflow.com/questions/12049905/stripping-html-tags-in-postgresql). – Andrew Oct 26 '18 at 20:08

1 Answers1

1

I have no experience with PostreSQL and very little with XPATH, but I was able to get something for you:

with x as (select
'<TABLE>
        <TBODY>
            <TR>
                <TH class="RowTitle">Identificacion</TH>
                <TH class="colRol">Rol</TH>
            </TR>
            <TR class="tData">
                <TD class="RowTitle">
                    <A href="#">4444</A>
                </TD>
                <TD class="colRow" val="INVARGEXT">Investigador</TD>
            </TR>
            <TR class="tData">
                <TD class="RowTitle">
                    <A href="#">55555</A>
                </TD>
                <TD class="colRow" val="BECARIO">Becario</TD>
            </TR>
        </TBODY>
    </TABLE>'::xml as t
),
y as (select unnest(xpath('//TR[@class="tData"]', t)) td from x)
select -- y.td, -- just to debug
xpath('//TD[@class="RowTitle"]/A/text()', y.td),
xpath('//TD/text()', y.td)
from y;

This outputs:

    xpath   xpath
1   4444                                        Investigador
2   55555                                       Becario

I hope this can be of use.

More information here and here.

Andrew
  • 7,602
  • 2
  • 34
  • 42