1

The question looks like the same as : XPath Get first element of subset but it's, I think, a bit different.

Here's the following blog: http://www.mademoiselledeco.com/

I want to get the first picture of each post. For that, I thought of the following xpath query :

//div[contains(@class,'type-post status-publish')]//img/@src

Following the example of the previous post I mentionned, I also tried: //div[contains(@class,'type-post status-publish')](//img/@src)[1]

but that says

Warning: DOMXPath::query(): Invalid expression

any idea?

Thanks a lot

Community
  • 1
  • 1
justberare
  • 1,003
  • 1
  • 9
  • 29
  • This `//div[contains(@class,'type-post status-publish')]//img[1]/@src` doesn't work? – potame Mar 09 '15 at 13:25
  • No unfortunately, its seems to take all img element of each post. What I'm trying to do is to get only the first occurence of the img tag – justberare Mar 09 '15 at 13:35

3 Answers3

1

OK, I understand, after inspection of the source: each <img> is contained in a <p>, thus img[1] will match all pictures, since they are, in the context of a paragraph, the first image.

In this context, I would rather try getting the first paragraph containing an image:

//div[contains(@class,'type-post status-publish')]//p[img][1]/img/@src

With this XPath I get 9 img/@src.

potame
  • 7,597
  • 4
  • 26
  • 33
1
//div[@class='post-content-container']//p[./img][1]/img

This is not the best solution but I think it would work.

//div[@class='post-content-container']

Should get each post

//p[./img][1]/img

Should get the first paragraph, which contains an image. Then selects the image.

Helmer
  • 439
  • 4
  • 14
1

Actually the duplicate question you've picked isn't that far off. It has an explanation in one of it's answers which sounds pretty legit:

The [] operator has a higher precedence (binds stronger) than the // abbreviation.

So the //img abbreviation stands in your way. Let's expand it:

/descendant-or-self::node()/child::img

Adding [1] at the end would select each first img child (which is exactly as others have outlined). This is also the reason why there is higher precedence for the predicate here.

The Abbreviated Syntax section in Xpath 1.0 actually covers this with a note:

NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.

That is: you're not looking for the descendant-or-self axis and any nodes children therein, but just for the first img element in the descendant axis:

/descendant::img[1]

So the xpath expression in full:

//div[contains(@class,'type-post status-publish')]/descendant::img[1]/@src

Result with your example (10):

 src="http://www.mademoiselledeco.com/wp-content/uploads/2015/03/Couleur-FionaLynch-Caroline-St.jpg"
 src="http://www.mademoiselledeco.com/wp-content/uploads/2015/02/2-OF-MO-cascade-lumineuse2-1024x398.jpg"
 src="https://s-media-cache-ak0.pinimg.com/736x/2e/f7/eb/2ef7eb28dc3e6ac9830cf0f1be7defce.jpg"
 src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/couleur-peinture-flamant-vert-trekking.jpg"
 src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/Lily-of-the-Valley-Designed-by-Marie-Deroudilhe-02.jpg"
 src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/shopping-decoration-jaune-bleu-delamaison-1024x866.jpg"
 src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/wikao-cheminee-berlin-mademoiselledeco4.jpg"
 src="http://www.mademoiselledeco.com/wp-content/uploads/2015/01/voeux2015-mademoiselledeco-blog.jpg"
 src="http://www.mademoiselledeco.com/wp-content/uploads/2014/12/suite-novotel-constance-guisset-1.jpg"
 src="http://www.mademoiselledeco.com/wp-content/uploads/2014/12/wish-list-decoration-noel-2014.jpg"

I hope this sheds some light.

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836