I am trying to parse a HTML document using jsoup and I want to allow <table>
tag but not allow <tbody>
.
I have seen this link:
Jsoup parsing an Html file with a tbody tag
and I tried with
Whitelist whiteList = Whitelist.relaxed();
whiteList.addTags("table");
whiteList.addTags("font");
whiteList.addAttributes("table", "align");
whiteList.addAttributes("tr","align");
//whiteList.removeTags("tbody");
String html = "<table>"
+ "<tr align='top'>"
+ "<th><font>Link</th>"
+ "</tr>"
+ "</table>";
boolean valid = Jsoup.isValid(html, whiteList);
System.out.println(valid);
If I remove the commented line I am getting false
.
Also changing it to:
Document document = Jsoup.parse(html,"",Parser.xmlParser());
doesn't have much of an effect.
Is there any workaround for this?
I want to allow <table>
but not allow <tbody>
.
PS - I have thought of checking for <tbody>
before parsing but it is somehow not a very good solution I feel.