0

This is my java code:

class SentenceNode {
    Node xscope;
    Node cue;
}

List<SentenceNode> getSentenceNodes(InputSource is) {
    List<SentenceNode> sentenceNodes = new ArrayList<SentenceNode>();
    try {

        Object xscopes = XPathFactory
                .newInstance()
                .newXPath()
                .evaluate("//xscope/cue", is,
                        XPathConstants.NODESET);
        if (xscopes != null) {
            NodeList cuesNodes = (NodeList) xscopes;
            for (int i = 0; i < cuesNodes.getLength(); i++) {
                SentenceNode sentenceNode = new SentenceNode();
                Node cue = cuesNodes.item(i);
                sentenceNode.cue = cue;
                NodeList xscope = cue.getParentNode().getParentNode()
                        .getChildNodes();
                for (int j = 0; j < xscope.getLength(); j++) {
                    Node n = xscope.item(j);
                    if (n.getNodeName().equals("xscope")) {
                        sentenceNode.xscope = n;
                        break;
                    }
                }
                sentenceNodes.add(sentenceNode);

            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

    return sentenceNodes;
}

public void displaySentenceNodes() throws ClassNotFoundException, ClassCastException,
        IOException {
    InputSource is = new InputSource(new StringReader("TestBIO.xml"));
    List<SentenceNode> nodes = getSentenceNodes(is);
    for (SentenceNode node : nodes) {

        System.out.println("Xscope: " + node.xscope.getTextContent());
        System.out
                .println("Cue: " + node.cue.getTextContent());

    }

I want to extract from this xml the sentence with its cue and xscope.For each sentence I want to obtain the xscope and cue.If the sentence has more xscopes and more cues I want to obtain all. Here is my xml:

   <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Annotation created="22/2/2010" creator="BioscopeWriterCasConsumer">
<DocumentSet> 
        <Document type="Biological_abstract">
            <DocID type="PMID">1984449</DocID>
                <DocumentPart type="AbstractText">
                <sentence>When cells were infected with HIV, no induction of NF-KB factor was detected, <xscope>whereas high level of progeny virions was produced, <cue>suggesting</cue> that</xscope>.</sentence>
                <sentence> HIV <xscope><cue>could</cue> mimic some differentiation/activation stimuli allowing nuclear NF-KB expression</xscope>.</sentence>
                </DocumentPart>
        </Document>     
</DocumentSet>
</Annotation>

An error occured when I am trying to parse the xml file.

[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(Unknown Source)
Nadd
  • 126
  • 1
  • 8

2 Answers2

0

You are missing ? Character in xml. Should start:

  <?xml version="
Ori Marko
  • 56,308
  • 23
  • 131
  • 233
  • 1
    In my xml file appears.But here in the post I missed it.I corrected my post now.The problem is not that.Thank you – Nadd Jul 08 '17 at 09:45
0

I found the equivalent.Is like XPath but parse the xml with DOM parser bottom-up Here is the code:

  class SentenceNode {
            Node xscope;
            Node cue;
        }

List<SentenceNode> extractElem(String file) throws ParserConfigurationException,
            SAXException, IOException {
        List<SentenceNode> sentenceNodes = new ArrayList<SentenceNode>();
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse("Test1.xml");
        document.getDocumentElement().normalize();
        NodeList nList = document.getElementsByTagName("cue");
        for (int temp = 0; temp < nList.getLength(); temp++) {
            SentenceNode sentNode = new SentenceNode();
            Node nodeCue = nList.item(temp);
            sentNode.cue = nodeCue;
            NodeList xscope = null;
            if(nodeCue.getParentNode().getParentNode().getNodeName().equals("sentence")){
                 xscope = nodeCue.getParentNode().getParentNode()
                        .getChildNodes();
            }
            else if(nodeCue.getParentNode().getParentNode().getNodeName().equals("xscope")){
                 xscope = nodeCue.getParentNode().getParentNode().getParentNode()
                        .getChildNodes();
            }
            for (int j = 0; j < xscope.getLength(); j++) {
                Node n = xscope.item(j);
                if (n.getNodeName().equals("xscope")) {
                    sentNode.xscope = n;
                    break;
                }
            }

            sentenceNodes.add(sentNode);
        }
        return sentenceNodes;
    }

And it worked

Nadd
  • 126
  • 1
  • 8