First, if you happen to be parsing HTML you really need to use an HTML parser because the XML parser will fail to parse it. That said, with a little coding you can do this using SimpleXml.
These are the steps:
- Create an instance of the parser (no configuration necessary, DTDs and namespaces are deliberately left out of the parser)
- Parse the XML into a DOM tree
- Use a getElementsByTagName("table") to get all the table elements
- Loop through the list and convert DOM elements to POJOs
It turns out that SimpleXml doesn't have a getElementsByTagName() out of the box so I wrote one. Here is the full code:
import xmlparser.XmlParser;
import xmlparser.annotations.XmlName;
import xmlparser.annotations.XmlTextNode;
import xmlparser.model.XmlElement;
import java.util.ArrayList;
import java.util.List;
public final class Question {
private static final String xml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+
"<!DOCTYPE note SYSTEM \"Note.dtd\">\n" +
"<note>\n" +
"<to>Tove</to>\n" +
"<from>Jani</from>\n" +
"<heading>Reminder</heading>\n" +
"<body>Don't forget me this weekend!</body>\n" +
"<sometag>\n" +
"<table>\n" +
"<tr><td>content</td></tr>\n" +
"</table>\n" +
"</sometag>\n" +
"</note>\n";
@XmlName("table")
private static class Table {
private List<Question.Row> tr;
}
private static class Row {
private List<Cell> td;
}
private static class Cell {
@XmlTextNode
private String text;
}
public static void main(final String... args) {
final XmlParser simple = new XmlParser();
final XmlElement xmlElement = simple.fromXml(xml);
final List<XmlElement> list = getElementsByTagName(xmlElement, "table");
for (final XmlElement element : list) {
Table table = simple.fromXml(element, Table.class);
System.out.println(table.tr.get(0).td.get(0).text);
}
}
private static List<XmlElement> getElementsByTagName(final XmlElement element, final String name) {
final List<XmlElement> list = new ArrayList<>();
getElementsByTagName(element, name, list);
return list;
}
private static void getElementsByTagName(final XmlElement element, final String name, final List<XmlElement> list) {
if (element == null) return;
if (name.equals(element.name)) list.add(element);
if (element.children == null) return;
for (final XmlElement child : element.children) {
getElementsByTagName(child, name, list);
}
}
}
The output of the code will be a single line with the word 'content'.
SimpleXml is in maven central: https://mvnrepository.com/artifact/com.github.codemonstur/simplexml/2.8.1