38

Is there a way in jsoup to extract an image absolute url, much like one can get a link's absolute url?

Consider the following image element found in http://www.example.com/

<img src="images/chicken.jpg" width="60px" height="80px">

I would like to get http://www.example.com/images/chicken.jpg. What should I do?

r0u1i
  • 3,526
  • 6
  • 28
  • 36

4 Answers4

73

Once you have the image element, e.g.:

Element image = document.select("img").first();
String url = image.absUrl("src");
// url = http://www.example.com/images/chicken.jpg

Alternatively:

String url = image.attr("abs:src");

Jsoup has a builtin absUrl() method on all nodes to resolve an attribute to an absolute URL, using the base URL of the node (which could be different from the URL the document was retrieved from).

See also the Working with URLs jsoup documentation.

Jonathan Hedley
  • 10,442
  • 3
  • 36
  • 47
  • 3
    I've tried that, and it didn't work (returned an empty string), for some reason (unlike link.attr("abs:href") which worked) – r0u1i Feb 03 '11 at 09:12
  • That's odd. Can you post (or email me) a sample of it not working for you? I just added a passing test case to confirm it works: https://github.com/jhy/jsoup/commit/c659826cc45517535253fc59791fc53af95b5f9f – Jonathan Hedley Feb 03 '11 at 11:27
  • Sorry, after more investigation it was my fault. It works now, sorry for the misunderstanding. – r0u1i Feb 17 '11 at 09:14
  • @JonathanHedley Hi, I have problem in Url fetching, could you please answer my question at your confidence, Thanks in advance. http://stackoverflow.com/questions/36373775/canot-fetch-image-url-inside-specic-class-jsoup – Farid Apr 02 '16 at 14:11
  • 1
    When using jsoup doc with already fetched html, we must first set **doc.setBaseUri(..)** for it to work. – Gayan Weerakutti Oct 20 '16 at 13:01
12
Document doc = Jsoup.connect("www.abc.com").get();
Elements img = doc.getElementsByTag("img");
for (Element el : img) {
    String src = el.absUrl("src");
    System.out.println("Image Found!");
    System.out.println("src attribute is : "+src);
    getImages(src);
}
rrk
  • 15,677
  • 4
  • 29
  • 45
Gaurab Pradhan
  • 281
  • 1
  • 5
  • 14
2

Let's assume you are parsing http://www.example.com/index.html.

Use jsoup to extract the img src which gives you: images/chicken.jpg

You can then use the URI class to resolve this to an absolute path:

URL url  = new URL("http://www.example.com/index.html");
URI uri = url.toURI();
System.out.println(uri.resolve("images/chicken.jpg").toString());

prints

http://www.example.com/images/chicken.jpg
dogbane
  • 266,786
  • 75
  • 396
  • 414
0

It might be inside a div class so the code would be like this (as example only)

System.out.println(doc.select("div.ClassName image").attr(src));
PHPFan
  • 756
  • 3
  • 12
  • 46