61

I want to use XPath to get the href attribute from an a-tag, but it has two occurrences within the same file. How am I getting along? I need to check IF there is an href attribute with value $street/object, I have got this code and it does not work:

$product_photo     = $xpath->query("//a[contains(@href,'{$object_street}fotos/')][1]");
        $product_360       = $xpath->query("//a[contains(@href,'{$object_street}360-fotos/')][1]");
        $product_blueprint = $xpath->query("//a[contains(@href,'{$object_street}plattegrond/')][1]");
        $product_video     = $xpath->query("//a[contains(@href,'{$object_street}video/')][1]");

It does not return anything at all. Who can help me out?

user3239713
  • 781
  • 2
  • 6
  • 11

2 Answers2

124

For the following HTML document:

<html>
  <body>
    <a href="http://www.example.com">Example</a> 
    <a href="http://www.stackoverflow.com">SO</a> 
  </body>
</html>

The xpath query /html/body//a/@href (or simply //a/@href) will return:

    http://www.example.com
    http://www.stackoverflow.com

To select a specific instance use /html/body//a[N]/@href,

    $ /html/body//a[2]/@href
    http://www.stackoverflow.com

To test for strings contained in the attribute and return the attribute itself place the check on the tag not on the attribute:

    $ /html/body//a[contains(@href,'example')]/@href
    http://www.example.com

Mixing the two:

    $ /html/body//a[contains(@href,'com')][2]/@href
    http://www.stackoverflow.com
mockinterface
  • 14,452
  • 5
  • 28
  • 49
  • **EDIT:** How could I check for a specific href attribute? Shall I then use `/html/body//a[1]/@href='{$object_street}/x'`? – user3239713 Jan 30 '14 at 11:43
  • Thank you a lot for the effort! Unfortunately, I am still having trouble, I suppose it is not the query that is wrong. Do you mind taking a look at the procedural code for me and putting me on the right track? Because, if so, I will post the code. – user3239713 Jan 30 '14 at 12:01
  • 1
    Make sure your query evaluates the {$object_street} properly, maybe put it in a string first, as in "string s = //a[contains(@href,'{$object_street}fotos/')][1]/@href" and check that `s` looks allright. – mockinterface Jan 30 '14 at 12:14
  • I have put my question here, but nobody is responding to it. So maybe you could take a look at it, please? – user3239713 Jan 30 '14 at 12:37
  • Oh, I am sorry for not including the link: http://stackoverflow.com/questions/21406694/domdocument-and-xpath-no-url-passed?noredirect=1#comment32292479_21406694 – user3239713 Jan 30 '14 at 12:52
  • Apologies, I am not well versed enough in php to comment on your problem. The question looks too long to me though, maybe you could distill it to a small sample html (as in my example) and the essence of php code that fails? It will make easier on SO users to read and answer. – mockinterface Jan 30 '14 at 12:58
  • The problem is that I am not sure about where the code fails, whether it is about a conditional, or the XPath query or something else, haha. So I find it hard to distill it. – user3239713 Jan 30 '14 at 13:01
  • 1
    It is returning an array not the specific string value – Jeú Casulo Dec 19 '18 at 15:04
  • for some reason i don't get the url back I get `` instead of `{$link}` – chovy Dec 11 '21 at 06:48
6

The answer shared by @mockinterface is correct. Although I would like to add my 2 cents to it.

If someone is using frameworks like scrapy the you will have to use /html/body//a[contains(@href,'com')][2]/@href along with get() like this:

response.xpath('//a[contains(@href,'com')][2]/@href').get()
Rahul Saxena
  • 422
  • 1
  • 9
  • 22