-1

I try to get all divs from the website. If I try it with google.com or another webpage it works fine, just instagram gives an empty result. The metod looks like:

public static List<String> getPhotoPaths(String url) {
    List<String> paths = new ArrayList<>();

    try {
        Document doc = Jsoup.connect("https://www.instagram.com/explore/tags/test/")
                .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2")
                .get();


        for (Element element : doc.select("div")) {
            System.out.println(element);
        }

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return paths;

}

Has someone an idea whats wrong? This is the test website, it uses normaly divs like every other page. Or not?

Jim
  • 1

1 Answers1

0

You don't get any result because Instragram loads those pictures asynchronously thanks to Javascript (if you disable it in your browser you will not be able to see pictures anymore), hence it is not available when the page is loaded. Unfortunately JSoup cannot deal with Javascript, so you should use another library that can handle it or parse by yourself the JSON object assigned to window._sharedData variable, which contains the URLs pointing to the thumbnails and the original pictures

user2340612
  • 10,053
  • 4
  • 41
  • 66