91

I want to select just a class on its own called .date

For some reason, I cannot get this to work. If anyone knows what is wrong with my code, it would be much appreciated.

@$doc = new DOMDocument();
@$doc->loadHTML($html);
$xml = simplexml_import_dom($doc); // just to make xpath more simple
$images = $xml->xpath('//[@class="date"]');                             
foreach ($images as $img)
{
    echo  $img." ";
}
General Grievance
  • 4,555
  • 31
  • 31
  • 45
Teddy13
  • 3,824
  • 11
  • 42
  • 69
  • 2
    and what about piece of html ? ( Prefer to show us simpleXml output from asXML() as it is nearer to xpath ) – SergeS Jan 10 '12 at 19:03
  • if there is multiple classes you need to do `contains(@class, 'date')` – Gordon Jan 10 '12 at 19:04
  • possible duplicate of [PHP - Parse All Links That Contain A Speciffic Word In "href" Tag](http://stackoverflow.com/questions/8208240/php-parse-all-links-that-contain-a-speciffic-word-in-href-tag) – Gordon Jan 10 '12 at 19:09
  • possible duplicate of [XPath: How to match attributes that contain a certain string](http://stackoverflow.com/questions/1390568/xpath-how-to-match-attributes-that-contain-a-certain-string) – hakre Jun 13 '12 at 17:34
  • @Gordon's answer is dangerous, if the class attribute is "datetime" it would also match. user716736's answer is more complete. – Niels Bom Oct 12 '12 at 13:25
  • @NielsBom *dangerous* is a rather odd choice of words. Also, my answer clearly states it will find all links that contain (note the emphasis) the search word. And it doesn't change that this question is a dupe of many. – Gordon Oct 12 '12 at 14:37
  • Dupe:sure. Dangerous: I'll rephrase that to: you might get more than you would expect. And sorry but I don't think your comment ("if there is...'date')") is clear. – Niels Bom Oct 16 '12 at 08:39
  • Related: http://stackoverflow.com/questions/1604471/how-can-i-find-an-element-by-css-class-with-xpath and http://stackoverflow.com/questions/1390568/how-to-match-attributes-that-contain-a-certain-string – Timo Huovinen Mar 31 '14 at 12:38
  • Possible duplicate of [How can I find an element by CSS class with XPath?](https://stackoverflow.com/questions/1604471/how-can-i-find-an-element-by-css-class-with-xpath) – siegi Sep 17 '17 at 10:26

6 Answers6

256

I want to write the canonical answer to this question because the answer above has a problem.

Our problem

The CSS selector:

.foo

will select any element that has the class foo.

How do you do this in XPath?

Although XPath is more powerful than CSS, XPath doesn't have a native equivalent of a CSS class selector. However, there is a solution.

The right way to do it

The equivalent selector in XPath is:

//*[contains(concat(" ", normalize-space(@class), " "), " foo ")]

The function normalize-space strips leading and trailing whitespace (and also replaces sequences of whitespace characters by a single space).

(In a more general sense) this is also the equivalent of the CSS selector:

*[class~="foo"]

which will match any element whose class attribute value is a list of whitespace-separated values, one of which is exactly equal to foo.

A couple of obvious, but wrong ways to do it

The XPath selector:

//*[@class="foo"]

doesn't work! because it won't match an element that has more than one class, for example

<div class="foo bar">

It also won't match if there is any extra whitespace around the class name:

<div class="  foo ">

The 'improved' XPath selector

//*[contains(@class, "foo")]

doesn't work either! because it wrongly matches elements with the class foobar, for example

<div class="foobar">

Credit goes to this fella, who was the earliest published solution to this problem that I found on the web: http://dubinko.info/blog/2007/10/01/simple-parsing-of-space-seprated-attributes-in-xpathxslt/

John Smith
  • 835
  • 1
  • 7
  • 19
user716736
  • 2,576
  • 1
  • 14
  • 2
13

//[@class="date"] is not a valid xpath.

Try //*[@class="date"], or if you know it is an image, //img[@class="date"]

MrGlass
  • 9,094
  • 17
  • 64
  • 89
7

XPath 3.1 introduces a function contains-token and thus finally solves this ‘officially’. It is designed to support classes.

Example:

//*[contains-token(@class, "foo")]

This function makes sure that white space (not only (U+0020)) is handled correctly, works in case of class name repetition, and generally covers the edge cases.


Note: As of today (2016-12-13) XPath 3.1 has status of Candidate Recommendation.

Robin Pokorny
  • 10,657
  • 1
  • 24
  • 32
  • It does not work in today's latest chrome. Until it works, how do we get around the limitation that //*[contains(@class, "foo")] will also select any class that contains foo, such as foobar, fooz etc. – MasterJoe Mar 29 '18 at 19:25
3

In XPath 2.0 you can:

//*[count(index-of(tokenize(@class, '\s+' ), 'foo')) = 1]

as stated by Christian Weiske in: https://cweiske.de/tagebuch/XPath%3A%20Select%20element%20by%20class.htm

Memke
  • 684
  • 1
  • 7
  • 24
  • unfortunately this doesn't seem to be implemented by chrome as of 6/12/2017. based on https://en.wikipedia.org/wiki/Comparison_of_layout_engines_(XML)#Query_technologies it seems to be lacking pretty much across the board – JonnyRaa Dec 06 '17 at 17:54
1

HTML allows case-insensitive element and attribute names and then class is a space separated list of class-names. Here we go for a img tag and the class named date:

//*['IMG' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')]/@*['CLASS' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') and contains(concat(' ', normalize-space(.), ' '), concat(' ', 'date', ' '))]

See as well: CSS Selector to XPath conversion

hakre
  • 193,403
  • 52
  • 435
  • 836
1

BEWARE OF MINUS SIGNS IN TEMPLATE !!! If you are querying for "my-ownclass" in DOM:

<ul class="my-ownclass"><li>...</li></ul>
<ul class="someother"><li>...</li></ul>
<ul><li>...</li></ul>

$finder = new DomXPath($dom);
$nodes = $finder->query(".//ul[contains(@class, 'my-ownclass')]"); // This will NOT behave as expected! This will strangely match all the <ul> elements in DOM.
$nodes = $finder->query(".//ul[contains(@class, 'ownclass')]"); // This will match the element.
Vlado
  • 3,517
  • 2
  • 26
  • 24