0

So I was curious (new to Jsoup) if there is a way to pull every single piece of content (for example every image from the page)? I assume we would have to get the count of img src and loop through, but I don't understand how to do this regardless of the page (ie I don't want to make it specific for only one page, so any URL I decide to crawl the program still works).

Here is my code, but the problem is it gets every 'alt' tag but not every 'src' tag (I'm using https://www.shutterstock.com/search/website as a test):

Document document = Jsoup.connect(url).get();

Elements idata = document.select("img");

for (Element e : idata) {
    System.out.println("SRC: " + e.select("img").attr("src"));
    System.out.println("ALT: " + e.select("img").attr("alt"));
}
  • Hi there, welcome to StackOverflow! StackOverflow is about helping you with your code, not asking others to write code for you, so your question is off topic. But there are existing tutorials about using JSoup to build a web crawler. Google them, or start with this one: https://mkyong.com/java/jsoup-basic-web-crawler-example/ – Sean Patrick Floyd Jun 17 '21 at 00:08
  • Hi! Thanks - I've updated my post with the code that I'm trying now that should hopefully make it more specific. –  Jun 17 '21 at 00:32
  • See [Java HTML Parsing a Page with Infinite Scroll](https://stackoverflow.com/questions/32100804/java-html-parsing-a-page-with-infinite-scroll) –  Jun 17 '21 at 00:58
  • Ahh ok makes sense. However when I try this on a smaller page (https://www.investopedia.com/articles/investing/012715/5-richest-people-world.asp), I am not able to get the src links for all of them. I just get a long string like `data:image/gif;charset=utf-8;base64,....` –  Jun 17 '21 at 01:07
  • It is an base64 encoded image. See [How to display base64 encoded image in html](https://stackoverflow.com/questions/41053901/how-to-display-base64-encoded-image-in-html) –  Jun 17 '21 at 01:28
  • Oh so I can just put that in the src tag? Is there a tutorial / way from a Java application to display image srcs onto the page? –  Jun 17 '21 at 01:30
  • Yeah I'm still having issues displaying the src tag I have in my Java application (its an array of src tags so I want to iterate through it) on my either .html or .jsp page. –  Jun 17 '21 at 02:04

0 Answers0