0

I've been looking on jsoup page, but all I could do was extract titles and so on from url... but I need whole absolute url address from exact div. I want to store it somewhere and use it later.

<div class="link-block container">
                <a href="/what-to-do/11636002" rel="nofollow" 
                        title="unique abilities" class="just-link">
                </a>
</div>

As I said, I tried String absHref = link.attr("abs:href"), but it gave me the "title" part from the code. What I am doing wrong? Please give me some advice.

edinson
  • 43
  • 5
  • Show us your code implementation. – Manish May 08 '15 at 23:54
  • For getting absolute url from some part of it you need to use a regex http://stackoverflow.com/questions/29326901/converting-window-openhyperlink-javascript-code-to-pure-absolute-url-with-java – PHPFan May 09 '15 at 04:32
  • I found quite simple way: URL baseUrl = new URL("my base url"); URL url = new URL(baseUrl, "/what-to-do/11636002"); and it works fine, because I have got an absolute link at the end. Now just tell me guys how to extract the "/what-to-do/11636002" part using for example jsoup ?? – edinson May 09 '15 at 06:57
  • If some answer worked for you then you should accept it. Else, if you have later found out a better solution to the problem, you can answer your own question and accept that. – gnsb Nov 19 '15 at 05:00

1 Answers1

0

You can do it like this:

String myHtml = "<div class=\"link-block container\">\n"
                + "  <a href=\"/what-to-do/11636002\" rel=\"nofollow\" title=\"unique abilities\" class=\"just-link\">\n"
                + "  </a>\n"
                + "</div>";

Document doc = Jsoup.parseBodyFragment(myHtml, "http://your.baseurl");
Element e = doc.select("a").first();

System.out.println(e.attr("abs:href"));

Prints:

http://your.baseurl/what-to-do/11636002

If you want to get all a Elements which are similar, do:

Elements elements = doc.select("a[href*=/what-to-do/");
for (Element e: elements) {
   System.out.println(e.attr("abs:href"));
}

This will get you all a with href containing "/what-to-do/".

Jonas Czech
  • 12,018
  • 6
  • 44
  • 65
  • The problem is that I don't know how to get exactly this part into my variable (for example myHtml). That was my question. – edinson May 09 '15 at 08:42
  • @edinson From where do you want to get it ? If you have myHtml as a String, then you should parse it as in my answer. If it's from a URL, use `Jsoup.connect(yourUrl).get();` or do you mean something else ? It's not quite clear to me. – Jonas Czech May 09 '15 at 08:55
  • I have got a whole HTML site. And from the whole site' code I need to extract the "/what-to-do/11636002" part. So it is not the only one url in the code. – edinson May 10 '15 at 09:59
  • @edinson, Just select all `a` Elements from the page which you want. I've updated my answer. – Jonas Czech May 10 '15 at 12:40