I try to parse the XML output of Stanford NLP
in java
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader("<a>"+tagged+"</a>"));
Document doc = builder.parse(is);
doc.getDocumentElement().normalize();
NodeList nl=doc.getElementsByTagName("sentence");
The problem is that the XML output of Stanford NLP
contains "
like
<word wid="9" pos="``" lemma=""">"</word>
Then, I get the error:
[Fatal Error] :11:34: Element type "word" must be followed by either attribute specifications, ">" or "/>".
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 11; columnNumber: 34; Element type "word" must be followed by either attribute specifications, ">" or "/>".
at java.xml/com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:261)
at java.xml/com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
at y.main(y.java:46)
I thought of replacing/escaping """
and >"<
, but it is a non-standard approach and may break the entire XML.