1

I have three samples of text nodes that and I want to extract three different parts of the text, using a universal x-path.

First

<p class="product-summary">
                This is an amazing game from the company Midway Games. Excellent gameplay. Very good game.
            </p>

Second

<p class="product-summary">
                New Line Cinema distributed this movie in 1995.
            </p>

Third

<p class="product-summary">
                New game from 2011, with new 3D graphics. This game was made by NetherRealm Studios.  
            </p>

The extraction should be either Midway Games or New Line Cinema or NetherRealm Studios Note that the text node allways include just one company, never two or three (just one).

My try is from this question but the problem is that it dosen't work nor include all three companies.

substring('Midway Games',1,12*contains(//p[@class='product-summary']/following-sibling::text()[1], 'Midway Games'))
Community
  • 1
  • 1
Liu Kang
  • 1,359
  • 4
  • 22
  • 45

1 Answers1

1

As the input will only contain one of them, you can use concat to join the results.

concat(
  substring('Midway Games', 1,
      12*contains(//p[@class='product-summary'], 'Midway Games')),
  substring('Line Cinema', 1,
      11*contains(//p[@class='product-summary'], 'Line Cinema')),
  substring('NetherRealm Studios', 1,
      19*contains(//p[@class='product-summary'], 'NetherRealm Studios'))
)

You can remove the line breaks that I added for readability as you want.

I had to fix the query you provided: the text nodes are no following-siblings, but children. Your XPath processor will query the (concatenated) text nodes below that element anyway as contains works on strings.

Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • Thanks you so much Jens. This works very good. Should this work with DOMXpath? – Liu Kang Feb 10 '14 at 21:08
  • If you speak of PHP's DOMXPath class: yes, they're using libxml that I tested against. Anyway, you might consider running multiple queries and add everything together in PHP, this would probably be cleaner code as this XPath hack. – Jens Erat Feb 10 '14 at 23:58