0

I am trying to use jsoup selector on recursive levels which returns empty results.

HTML structure

<body>
    <div>
        <div>
            <div class="classA"/>
        </div>
    </div>
</body>

Java code

Document doc = Jsoup.connect("https://someUrl//url").get();
int size=doc.body().select(".classA").size(); // Returns 0
Pratik
  • 908
  • 2
  • 11
  • 34

1 Answers1

1

Your code works for me using latest Jsoup version 1.11.3.

Document doc = Jsoup.parse("<body> <div> <div> <div class=\"classA\"/> </div> </div> </body>");
int size = doc.body().select(".classA").size();
System.out.println(size);   // displays: 1

Possible cause of your problem:

  • You're using older Jsoup version, something between 1.9.2 and 1.10.3 because there was a bug causing classes to be stored only as lowercase. https://github.com/jhy/jsoup/issues/814 https://github.com/jhy/jsoup/issues/830 Fixed in version 1.10.3.
  • The website you're trying to parse loads additional content with JavaScript(AJAX). Jsoup can only "see" original HTML before JavaScript modifications. To see original HTML visit webpage in web browser and press CTRL+U (View source). Don't use debugger/firebug and Inpect as they display final, already modified HTML code.

Comment response:

It's hard to guess without the URL you're trying to parse. Your browser probably loads dynamic parts of this webpage from different URLs. You could try to parse only these HTML fragments. Check my answer here: How to Load Entire Contents of HTML - Jsoup

Krystian G
  • 2,842
  • 3
  • 11
  • 25
  • Thanks @Krystian. I think my issue is second one. Is there any way I can force JSoup to parse page only after Javascript update on DOM elements ? – Pratik Mar 18 '19 at 03:56