1

According to this answer:

HTML 4.01 specifies that <a> elements may only contain inline elements. A <div> is a block element, so it may not appear inside an <a>.

But...

HTML5 allows <a> elements to contain blocks.

Well, I just tried selecting a <div class="m"> within an <a> block, using:

Elements elems = a.select("m");

and elmes returns empty, despite the div being there.

So I am thinking: Either I am not using the correct syntax for selecting a div within an a or... Jsoup doesn't support this HTML5-only feature?

What is the right Jsoup syntax for selecting a div within an a?

Update: I just tried

Elements elems = a.getElementsByClass("m");

And Jsoup had no problems with it (i.e. it returns the correct number of such divs within a).

So my question now is: Why?

Why does a.getElementsByClass("m") work whereas a.select("m") doesn't?

Update: I just tried, per @Delan Azabani's suggestion:

Elements elems = a.select(".m");

and it worked. So basically the a.select() works but I was missing the . in front of the class name.

Community
  • 1
  • 1
Regex Rookie
  • 10,432
  • 15
  • 54
  • 88

2 Answers2

3

The select function takes a selector. If you pass 'm' as the argument, it'll try to find m elements that are children of the a element. You need to pass '.m' as the argument, which will find elements with the m class under the a element.

Delan Azabani
  • 79,602
  • 28
  • 170
  • 210
1

The current version of jsoup (1.5.2) does support div tags nested within a tags.

In situations like this I suggest printing out the parse tree, to ensure that jsoup has parsed the HTML like you expect, or if it hasn't to know what the correct selector to use.

E.g.:

Document doc = Jsoup.parse("<a href='./'><div class=m>Check</div></a>");
System.out.println("Parse tree:\n" + doc);
Elements divs = doc.select("a .m");
System.out.println("\nDiv in A:\n" + divs);

Gives:

Parse tree:
<html>
 <head></head>
 <body>
  <a href="./">
   <div class="m">
    Check
   </div></a>
 </body>
</html>

Div in A:
<div class="m">
 Check
</div>
Jonathan Hedley
  • 10,442
  • 3
  • 36
  • 47
  • Indeed, thanks to a tip from @BalusC I regularly print the parse tree whenever I encounter an unexpected result. Still, I noticed that you used `doc.select()` whereas I am using `a.select()` (where 'a' is the already successfully extracted anchor element). I have no problem using `doc.select()` where `m` is prefixed by `div.`. I am trying to figure out the exact rules of the syntax for `a.select()`. +1. – Regex Rookie Apr 29 '11 at 11:02
  • element.select() is the same as doc.select(), but it only looks down from the element it was initialised on. (A Document extends Element). – Jonathan Hedley Apr 29 '11 at 13:08