0

When I try to use jsoup example in my Android app on page like this, then no images are found. It's because content is loaded through javascript?

Now my code look like this:

Document doc = Jsoup.connect(url).get();
Elements media = doc.select("[src]");

print("\nMedia: (%d)", media.size());
List<Link> tmpLinks = new ArrayList<>();
int i = 0;
for (Element src : media) {
    if (src.tagName().equals("img")) {
        if (!src.attr("abs:src").contains(".png") && !src.attr("abs:src").contains(".gif")) {
            String widthString = src.attr("width");
            int width;
            if (!widthString.isEmpty()) {
                width = Integer.parseInt(src.attr("width"));
            }
            else width = 0;
            String heightString = src.attr("height");
            int height;
            if (!heightString.isEmpty()) {
                height = Integer.parseInt(src.attr("height"));
            }
            else height = 0;

            if (width == 0 || width >= minAllowedWidth) {
                if (!src.attr("abs:src").isEmpty()) {
                    tmpLinks.add(new Link(i, src.attr("abs:src"), width, height));
                }
            }
            i++;
            print(" * %s: <%s> %sx%s (%s)",
                    src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"),
                    trim(src.attr("alt"), 20));
        }
    }
}

List<Link> noDuplicates = new ArrayList<>();
Set<String> titles = new HashSet<>();
for (Link link : tmpLinks ) {
    if (titles.add(link.getUrl())) {
        noDuplicates.add(link);
    }
}

List<Link> finalLinks = new ArrayList<>();
for (Link link : noDuplicates) {
    URL testUrl = new URL(link.getUrl());
    URLConnection urlConnection = testUrl.openConnection();
    urlConnection.connect();
    int file_size = urlConnection.getContentLength();
    System.out.println("Fetching size: " + link.getUrl() + " " + file_size);
    if (file_size >= minAllowedFileSize) {
        link.setSize(file_size);
        finalLinks.add(link);
    }
}
Collections.sort(finalLinks, new LinkSizeComparator());

The second question is, can I analyze with jsoup subpages (or image links) on site like this.

Tomas
  • 4,652
  • 6
  • 31
  • 37

1 Answers1

0

I've tried to crawl the website with Jsoup and I got no images so apparently it doesn't work with dynamic websites.

You could try what the user suggests in this thread Getting Jsoup to support dynamically generated html by JavaScript

Community
  • 1
  • 1
Eric Martinez
  • 31,277
  • 9
  • 92
  • 91
  • Thank you Eric, but I forgot to tell you, I use jsoup in Android app, so the linked javascript libraries I cannot use. I try also get html as string from webview, but without success. – Tomas May 27 '15 at 22:47
  • 1
    I've been thinking in something but I can't came up with nothing too "easy". I would say : create a webservice which uses HtmlUnit (or another library, if there it is another) then you consume that service from android and you forgot about Jsoup. I can't think of anything else :/. – Eric Martinez May 27 '15 at 23:13
  • Eric, you are right. I thought the same thing. But for me now is using java application on the server side expensive, so maybe in the future. But thank you for your response and tips! – Tomas May 28 '15 at 10:22