0

I'm trying desperately hard to figure this out but with no luck. I'm trying to parse this XML data in Postgres:

<map>
  <entry>
    <string>id</string>
    <string>555</string>
  </entry>
  <entry>
    <string>label</string>
    <string>Need This Value</string>
  </entry>
  <entry>
    <string>key</string>
    <string>748</string>
  </entry>
</map>

I'm trying to get the value in the string element right after <string>label</string>. Note that the Postgres version I'm working does not have the XML (libxml) function installed.

I have tried many variations of:

substring(xmlStringData from E'<string>label</string>\\n<string>(.*?)</string>')

but with no luck.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
cYn
  • 3,291
  • 6
  • 25
  • 43
  • 2
    Why don't you use [`xpath()`](http://www.postgresql.org/docs/current/interactive/functions-xml.html#FUNCTIONS-XML-PROCESSING) instead? – Amal Murali Jul 30 '14 at 16:02
  • 1
    As mentioned in the original post, that function is not installed on the Postgres database that I'm working with and I can't simply install it due to it being a production database. – cYn Jul 30 '14 at 16:06
  • 2
    Regex? Drop them and you'll see the light. You may try to match with two captures (`12`) but XML may be pretty _irregular_ and your regex may work only with a well-defined strict format. – Adriano Repetti Jul 30 '14 at 16:07
  • Hmmm, you pretty much lead me in the right direction by mentioning well-defined strict format – cYn Jul 30 '14 at 16:13
  • `the Postgres version I'm working does not have the XML (libxml) function installed.` How about letting us in on the secret, then? What *is* your Postgres version? – Erwin Brandstetter Jul 30 '14 at 16:28
  • @ErwinBrandstetter, I'm working with version 9.0. That version does support XML functions, but the system administrators never installed libxml when they shipped the database. – cYn Jul 30 '14 at 16:31

3 Answers3

2

xpath() would be the right tool here. Because, you know ...

While stuck with your unfortunate situation, this would do the trick:

WITH t(x) AS (SELECT '<map>
  <entry>
    <string>id</string>
    <string>555</string>
  </entry>
  <entry>
    <string>label</string>
    <string>Need This Value</string>
  </entry>
  <entry>
    <string>key</string>
    <string>748</string>
  </entry>
</map>'::text
)
SELECT substring(x, '<string>label</string>[\s]*?<string>(.*?)</string>')
FROM  t

Returns:

substring
---------------
Need This Value

regexp explained:

<string>label</string> .. finds the position
[\s].. whitespace (including \n and \r)
*? .. do this "non-greedy", so ignore whitespace up until ...
<string>.. the next string element
(.*?) .. capturing parentheses, any characters, non-greedy
</string> .. up to the next appearance of the end tag

This is halfway reliable, unless you throw in unconventional XML formatting - which is why you should use an XML parser to begin with ...

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

So I seem to got it figured out. I just needed to account for the spaces after the newline. The solution was:

substring(event_data from E'<string>label</string>\\n\\s*?<string>(.*?)</string>')
cYn
  • 3,291
  • 6
  • 25
  • 43
0

If your <entry> list is not variable. You can use the following regex and access to the capturing group in the 4th match to get the content.

<string>(.*?)<\/string>

Working demo

On the other hand, If you want to access at the first match, you can use the following regex:

<string>id<\/string>|<string>\d+<\/string>|<string>label<\/string>|<string>(.*?)<\/string>

Working demo

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
  • It seems like the OP wants grab that value immediately instead of having tons of matches to search through. – skamazin Jul 30 '14 at 16:17
  • 1
    @skamazin well, OP didn't specify that but you could be right. I provided another option to do it at the first time. Thanks for let me know that – Federico Piazza Jul 30 '14 at 16:28