How to parse the next xml using XPath in java?I want to extract the cue and the xscope from a sentence?

Question

This is my java code:

class SentenceNode {
    Node xscope;
    Node cue;
}

List<SentenceNode> getSentenceNodes(InputSource is) {
    List<SentenceNode> sentenceNodes = new ArrayList<SentenceNode>();
    try {

        Object xscopes = XPathFactory
                .newInstance()
                .newXPath()
                .evaluate("//xscope/cue", is,
                        XPathConstants.NODESET);
        if (xscopes != null) {
            NodeList cuesNodes = (NodeList) xscopes;
            for (int i = 0; i < cuesNodes.getLength(); i++) {
                SentenceNode sentenceNode = new SentenceNode();
                Node cue = cuesNodes.item(i);
                sentenceNode.cue = cue;
                NodeList xscope = cue.getParentNode().getParentNode()
                        .getChildNodes();
                for (int j = 0; j < xscope.getLength(); j++) {
                    Node n = xscope.item(j);
                    if (n.getNodeName().equals("xscope")) {
                        sentenceNode.xscope = n;
                        break;
                    }
                }
                sentenceNodes.add(sentenceNode);

            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

    return sentenceNodes;
}

public void displaySentenceNodes() throws ClassNotFoundException, ClassCastException,
        IOException {
    InputSource is = new InputSource(new StringReader("TestBIO.xml"));
    List<SentenceNode> nodes = getSentenceNodes(is);
    for (SentenceNode node : nodes) {

        System.out.println("Xscope: " + node.xscope.getTextContent());
        System.out
                .println("Cue: " + node.cue.getTextContent());

    }

I want to extract from this xml the sentence with its cue and xscope.For each sentence I want to obtain the xscope and cue.If the sentence has more xscopes and more cues I want to obtain all. Here is my xml:

   <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Annotation created="22/2/2010" creator="BioscopeWriterCasConsumer">
<DocumentSet> 
        <Document type="Biological_abstract">
            <DocID type="PMID">1984449</DocID>
                <DocumentPart type="AbstractText">
                <sentence>When cells were infected with HIV, no induction of NF-KB factor was detected, <xscope>whereas high level of progeny virions was produced, <cue>suggesting</cue> that</xscope>.</sentence>
                <sentence> HIV <xscope><cue>could</cue> mimic some differentiation/activation stimuli allowing nuclear NF-KB expression</xscope>.</sentence>
                </DocumentPart>
        </Document>     
</DocumentSet>
</Annotation>

An error occured when I am trying to parse the xml file.

[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(Unknown Source)

Yes, this was a problem.Thank you!I corrected but the error accured also.I think I must to include in the xPath all path from annotation but I don't know — Nadd, Jul 08 '17 at 07:22
I think your XPath is okay... https://stackoverflow.com/questions/4569123/content-is-not-allowed-in-prolog-saxparserexception — OneCricketeer, Jul 08 '17 at 07:25
Then, which is the problem?I don'd know why it doesen't work! — Nadd, Jul 08 '17 at 07:53
Read the linked question? Something about invisible characters in your xml and the error says its the very first character of the file — OneCricketeer, Jul 08 '17 at 07:55
Sure,I read it!..the xml looks like in the post..it's nothing special on line 1 column 1.Is the version of xml encoding:( — Nadd, Jul 08 '17 at 08:03
Yes, but as mentioned, invisible characters could be there. Try to open a brand new file and copy your xml into it, then try again — OneCricketeer, Jul 08 '17 at 08:04

score 0 · Answer 1 · answered Jul 08 '17 at 09:17

0

You are missing ? Character in xml. Should start:

  <?xml version="

answered Jul 08 '17 at 09:17

Ori Marko

56,308
23
131
233

1

In my xml file appears.But here in the post I missed it.I corrected my post now.The problem is not that.Thank you – Nadd Jul 08 '17 at 09:45

score 0 · Answer 2 · answered Jul 08 '17 at 09:42

I found the equivalent.Is like XPath but parse the xml with DOM parser bottom-up Here is the code:

  class SentenceNode {
            Node xscope;
            Node cue;
        }

List<SentenceNode> extractElem(String file) throws ParserConfigurationException,
            SAXException, IOException {
        List<SentenceNode> sentenceNodes = new ArrayList<SentenceNode>();
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse("Test1.xml");
        document.getDocumentElement().normalize();
        NodeList nList = document.getElementsByTagName("cue");
        for (int temp = 0; temp < nList.getLength(); temp++) {
            SentenceNode sentNode = new SentenceNode();
            Node nodeCue = nList.item(temp);
            sentNode.cue = nodeCue;
            NodeList xscope = null;
            if(nodeCue.getParentNode().getParentNode().getNodeName().equals("sentence")){
                 xscope = nodeCue.getParentNode().getParentNode()
                        .getChildNodes();
            }
            else if(nodeCue.getParentNode().getParentNode().getNodeName().equals("xscope")){
                 xscope = nodeCue.getParentNode().getParentNode().getParentNode()
                        .getChildNodes();
            }
            for (int j = 0; j < xscope.getLength(); j++) {
                Node n = xscope.item(j);
                if (n.getNodeName().equals("xscope")) {
                    sentNode.xscope = n;
                    break;
                }
            }

            sentenceNodes.add(sentNode);
        }
        return sentenceNodes;
    }

And it worked

How to parse the next xml using XPath in java?I want to extract the cue and the xscope from a sentence?

2 Answers2