14

Below is the column response from my Postgres table. I want to extract the status from all the rows in my Postgres database. The status could be of varying sizes like SUCCESS as well so I do not want to use the substring function. Is there a way to do it?

<?xml version="1.0" ?><response><status>ERROR_MISSING_DATA</status><responseType>COUNTRY_MISSING</responseType><country_info>USA</country_info><phone_country_code>1234</phone_country_code></response>

so my table structure is like this

   Column    |            Type             |                        Modifiers                         

-------------+-----------------------------+----------------------------------------------------------

 id          | bigint                      | not null default nextval('events_id_seq'::regclass)
 hostname    | text                        | not null
 time        | timestamp without time zone | not null
 trn_type    | text                        | 
 db_ret_code | text                        | 
 request     | text                        | 
 response    | text                        | 
 wait_time   | text                        | 

And I want to extract status from each and every request. How do i do this?

Below is a sample row. And assume the table name abc_events

id          | 1870667
hostname    | abcd.local
time        | 2013-04-16 00:00:23.861
trn_type    | A
request     | <?xml version="1.0" ?><response><status>ERROR_MISSING_DATA</status><responseType>COUNTRY_MISSING</responseType><country_info>USA</country_info><phone_country_code>1234</phone_country_code></response>
response    | <?xml version="1.0" ?><response><status>ERROR_MISSING_DATA</status><responseType>COUNTRY_MISSING</responseType><country_info>USA</country_info><phone_country_code>1234</phone_country_code></response>
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ronak
  • 1,770
  • 3
  • 20
  • 34
  • request has the xml request and response has the xml response. And the xml that I have shown above is an example of the response. Essentially I am trying to extract the status tag from each and every xml response row. – ronak Apr 16 '13 at 19:49

1 Answers1

26

Use the xpath() function:

WITH x(col) AS (SELECT '<?xml version="1.0" ?><response><status>ERROR_MISSING_DATA</status></response>'::xml)
SELECT xpath('./status/text()', col) AS status
FROM   x

/text() strips the surrounding <status> tag.
Returns an array of xml - with a single element in this case:

status
xml[]
-------
{ERROR_MISSING_DATA}

Applied to your table

In response to your question update, this can simply be:

SELECT id, xpath('./status/text()', response::xml) AS status
FROM   tbl;

If you are certain there is only a single status tag per row, you can simply extract the first item from the array:

SELECT id, (xpath('./status/text()', response::xml))[1] AS status
FROM   tbl;

If there can be multiple status items:

SELECT id, unnest(xpath('./status/text()', response::xml)) AS status
FROM   tbl;

Gets you 1-n rows per id.

Cast to xml

Since you defined your columns to be of type text (instead of xml, you need to cast to xml explicitly. The function xpath() expects the 2nd parameters of type xml. An untyped string constant is coerced to xml automatically, but a text column is not. You need to cast explicitly.

This works without explicit cast:

  SELECT xpath('./status/text()'
      ,'<?xml version="1.0" ?><response><status>SUCCESS</status></response>')

A CTE like in my first example needs a type for every column in the "common table expression". If I had not cast to a specific type, the type unknown would have been used - which is not the same thing as an untyped string. Obviously, there is no direct conversion implemented between unknown and xml. You'd have to cast to text first: unknown_type_col::text::xml. Better to cast to ::xml right away.

This has been tightened with PostgreSQL 9.1 (I think). Older versions were more permissive.

Either way, with any of these methods the string has to be valid xml or the cast (implicit or explicit) will raise an exception.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Do you need the `::xml`? I was just doing `SELECT xpath('...', 'xml');` and it seems to work. – Phrogz Apr 15 '13 at 20:23
  • I edited my question. Essentially what I want is to extract value of a tag from the column that has the xml request/response. – ronak Apr 16 '13 at 19:41
  • I followed it but I am getting this error `LINE 1: select unnest(xpath('./status/text()', request)) from abc_events ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.` It is pointing to the xpath function. – ronak Apr 16 '13 at 20:02
  • @Phrogz: I added a chapter on the topic of casting, since my initial comment wasn't completely correct. A cast is actually *needed* with a CTE in this case ... – Erwin Brandstetter Apr 16 '13 at 20:35
  • @ronak: I added a bit to my answer. Note the addendum about casting to xml. Also note I had the wrong cast at first. Must be `::xml`. – Erwin Brandstetter Apr 16 '13 at 20:40
  • @ronak: Cool. :) For more advanced acrobatics with `xpath()` consider [this related answer](http://stackoverflow.com/questions/7491479/xml-data-to-postgresql-database/7628453#7628453). – Erwin Brandstetter Apr 17 '13 at 01:04
  • But what if there is multiple tags in a column? How i can extract them? Suppose xml-data in a column is like - abcERROR_MISSING_DATA – Aamir Jan 17 '14 at 12:56
  • Hi, simple "cast XML to text" must use `//text()`... So `array_to_string( xpath('path//text()', xcontent)::text[] , '')` to obtain all text from, eg., the TXT of an HTML document. – Peter Krauss Dec 11 '17 at 18:44
  • ERROR_MISSING_DATA When the xml is something like this how can I get the xpath? For every entry in the table the urls may not be the same so I cannot keep the url as a part of the xpath right? – Surya Mar 05 '19 at 04:55