1

The HTML

<td> SCH4UE-01 : Chemistry <br> Block: 1 - rm. 315 </br></td>

I don't want the br tag, but I do want all of the other text (SCH4UE-01 : Chemistry)

CSS queries I have tried

td:eq(0) outputs: SCH4UE-01 : Chemistry Block: 1 - rm. 315

however

br outputs: Block: 1 - rm. 315

ArK
  • 20,698
  • 67
  • 109
  • 136
antonky
  • 768
  • 1
  • 8
  • 14
  • The `td` tag should be inside a `table`. Please add the full HTML so we can help you. – TDG Sep 13 '16 at 18:25

1 Answers1

1

The <br> tag is an empty tag which means that it has no end tag.

See: http://www.w3schools.com/tags/tag_br.asp

Replacing your </br> tag with <br> (if you print the jsoup document you will see, that jsoup fixes such mistakes automatically) your <td>tag has four childnodes:

  • #text
  • br
  • #text
  • br

So the text SCH4UE-01 : Chemistry is the first childnode (element.childNode(0)).

Code

String htmlString = "<html><body><table><td> SCH4UE-01 : Chemistry <br> Block: 1 - rm. 315 <br></td></table></body></html>";

Document doc = Jsoup.parse(htmlString);

Elements tdElements = doc.select("td");

for (Element tdElement : tdElements){
    System.out.println(tdElement.childNode(0));
}

Output

 SCH4UE-01 : Chemistry 
Frederic Klein
  • 2,846
  • 3
  • 21
  • 37
  • This works but why doesn't the CSS query work? Isn't `eq:(0)` equivalent to `childNode(0)`? – antonky Sep 14 '16 at 12:49
  • 1
    `:eq(index)`: :eq(n): "find elements whose sibling index is equal to n;". So it finds other tds with same sibling index. Also: TextNodes aren't html elements, so `tdElement.children().size()` is 2 (only the two `
    ` tags). See: http://stackoverflow.com/questions/5688712/is-there-a-css-selector-for-text-nodes
    – Frederic Klein Sep 14 '16 at 13:01