This is my java code:
class SentenceNode {
Node xscope;
Node cue;
}
List<SentenceNode> getSentenceNodes(InputSource is) {
List<SentenceNode> sentenceNodes = new ArrayList<SentenceNode>();
try {
Object xscopes = XPathFactory
.newInstance()
.newXPath()
.evaluate("//xscope/cue", is,
XPathConstants.NODESET);
if (xscopes != null) {
NodeList cuesNodes = (NodeList) xscopes;
for (int i = 0; i < cuesNodes.getLength(); i++) {
SentenceNode sentenceNode = new SentenceNode();
Node cue = cuesNodes.item(i);
sentenceNode.cue = cue;
NodeList xscope = cue.getParentNode().getParentNode()
.getChildNodes();
for (int j = 0; j < xscope.getLength(); j++) {
Node n = xscope.item(j);
if (n.getNodeName().equals("xscope")) {
sentenceNode.xscope = n;
break;
}
}
sentenceNodes.add(sentenceNode);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return sentenceNodes;
}
public void displaySentenceNodes() throws ClassNotFoundException, ClassCastException,
IOException {
InputSource is = new InputSource(new StringReader("TestBIO.xml"));
List<SentenceNode> nodes = getSentenceNodes(is);
for (SentenceNode node : nodes) {
System.out.println("Xscope: " + node.xscope.getTextContent());
System.out
.println("Cue: " + node.cue.getTextContent());
}
I want to extract from this xml the sentence with its cue and xscope.For each sentence I want to obtain the xscope and cue.If the sentence has more xscopes and more cues I want to obtain all. Here is my xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Annotation created="22/2/2010" creator="BioscopeWriterCasConsumer">
<DocumentSet>
<Document type="Biological_abstract">
<DocID type="PMID">1984449</DocID>
<DocumentPart type="AbstractText">
<sentence>When cells were infected with HIV, no induction of NF-KB factor was detected, <xscope>whereas high level of progeny virions was produced, <cue>suggesting</cue> that</xscope>.</sentence>
<sentence> HIV <xscope><cue>could</cue> mimic some differentiation/activation stimuli allowing nuclear NF-KB expression</xscope>.</sentence>
</DocumentPart>
</Document>
</DocumentSet>
</Annotation>
An error occured when I am trying to parse the xml file.
[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(Unknown Source)