1

I'm currently using HtmlUnit to attempt to grab an href out of a page and am having some trouble.

The XPath is:

/html/body/div[2]/div/div/table/tbody/tr/td[2]/div/div[5]/div/div[2]/span/a    

On the webpage it looks like:

<a class="t" title="This Brush" href=http://domain.com/this/that">Brush Set</a>

In my code I am doing:

hrefs = page.getByXPath("//html/body/div[2]/div/div/table/tbody/tr/td[2]/div/div[5]/div/div[2]/span/a[@class='t']")

However, this is returning everything in there instead of just the url that I want.

Can someone explain what I must add to get the href? (also it doesn't end with .html)

StartingGroovy
  • 2,802
  • 9
  • 47
  • 66

1 Answers1

5

You are selecting the a. You want to select the a/@href.

hrefs = page.getByXPath("//html/body/div[2]/div/div/table/tbody/tr/td[2]/div/div[5]/div/div[2]/span/a[@class='t']/@href")
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • Thank you for the prompt reply. Do you know why the following appears as well as the url? : DomAttr[name=href value= – StartingGroovy Nov 25 '10 at 01:45
  • 1
    I'm not familiar with Groovy, but my guess is that because you have selected the attribute and are getting the "toString()" representation of the object, rather than it's string value. Try using `hrefs.getValue()` http://stackoverflow.com/questions/3667352/htmlunit-and-xpath-domnode-getbyxpath-only-works-on-htmlpage/3669846#3669846 – Mads Hansen Nov 25 '10 at 02:09
  • You are correct Mads Hansen. Much appreciated. As a side note to anyone who might encounter a similar issue, I had to use page.getFirstByXPath instead of page.getByXPath – StartingGroovy Nov 30 '10 at 22:44
  • Mads Hansen, if you have a moment, could you check into: http://stackoverflow.com/questions/4320179/htmlunit-getbyxpath-returns-null – StartingGroovy Dec 01 '10 at 01:44