4

I want to extract all links that are in complex selectors like - .timestream .ui-ContentBottom h1 a. I know how to do it with simple links like just a single selector like a :

 $dom = new DOMDocument;
 $dom->loadHTML($html);
 $xpath = new DOMXPath($dom);
 $nodes = $xpath->query('//a/@href');
 foreach($nodes as $href) {
   echo $href->nodeValue;
 }

I am new to xPath so any help would be appreciated.

SanJeet Singh
  • 1,291
  • 2
  • 15
  • 28

2 Answers2

4

The following XPath expression should work for you:

//*[contains(@class, "timestream")]//*[contains(@class, "ui-ContentBottom")]//h1//a/@href

The problem here is that XPath does not have a native class selector. In other words, contains(@class, "smth") is not exactly the same as .smth, but, in practice, it usually works for matching a single class in a multi-valued class attribute value. See also:

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Could you please provide a link where I can read more about using complex xPath queries? Thanks :) – SanJeet Singh Sep 28 '15 at 03:20
  • @SanJeetSingh cannot recall anything specific, just google search the XPath tutorials and practice and practice more! Glad to help. – alecxe Sep 28 '15 at 03:27
0

xpath lets you search a document such as an xml or html file.

xpath will not show classes in the path, but will show ids with an @ symbol.

The xpath can be obtained in a few ways. One way in Chrome is to view the source of an element, right-click it and click Copy XPath.

When I do this on the the textarea box I am answering this question in, I receive the following xpath ::

//*[@id="wmd-input"]

Do not let that confuse you though. Here is a more simplistic example

/html/body

This is the xpath of the body element.

I wrote a small function that can help you turn xpaths into elements.

function xpath(path){
    for (var found, x = document.evaluate(path, document, null, XPathResult.ANY_TYPE, null), result = []; found = x.iterateNext();) {
        result.push(found);
    }
    return result;
}

This function produces the following when running it against this textarea ::

xpath('//*[@id="wmd-input"]');
[<textarea id=​"wmd-input" class=​"wmd-input processed" name=​"post-text" cols=​"92" rows=​"15" tabindex=​"101" data-min-length>​</textarea>​]

Now that you have the element you can modify it like this example :

var test = xpath('/html/body');
test[0].innerHTML='bye';
Jesse
  • 2,790
  • 1
  • 20
  • 36