XML read same tag with different segment

Question

Below are the xml file

<maindata>
        <publication-reference>
          <document-id document-id-type="docdb">
            <country>US</country>
            <doc-number>9820394ASD</doc-number>
            <date>20111101</date>
          </document-id>
          <document-id document-id-type="docmain">
            <doc-number>9820394</doc-number>
            <date>20111101</date>
          </document-id>
        </publication-reference>
</maindata>

i want to extract the <doc-number>tag value under the type = "docmain" below is my java code, while executed its extract 9829394ASD instead of 9820394

public static void main(String[] args) {
        String filePath ="D:/bs.xml";
        File xmlFile = new File(filePath);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder;
        try {
            dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(xmlFile);
            doc.getDocumentElement().normalize();
            System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
            NodeList nodeList = doc.getElementsByTagName("publication-reference");
            List<Biblio> docList = new ArrayList<Biblio>();
            for (int i = 0; i < nodeList.getLength(); i++) {
                docList.add(getdoc(nodeList.item(i)));
            }

        } catch (SAXException | ParserConfigurationException | IOException e1) {
            e1.printStackTrace();
        }
    }
    private static Biblio getdoc(Node node) {
           Biblio bib = new Biblio();
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            Element element = (Element) node;
            bib.setCountry(getTagValue("country",element));
            bib.setDocnumber(getTagValue("doc-number",element));
            bib.setDate(getTagValue("date",element));          
        }
        return bib;
    }

let me know how can we check the Type its docmain or doctype, should extract only if the type is docmain else should leave the element

added the getTagValue method

private static String getTagValue(String tag, Element element) {
        NodeList nodeList = element.getElementsByTagName(tag).item(0).getChildNodes();
        Node node = (Node) nodeList.item(0);
        return node.getNodeValue();
    }

Besides your problem, because you're trying to unmarshal a xml to a class, if you're using eclipse there is a tool called eclipselink moxy https://www.eclipse.org/eclipselink/ which is perfect for this kind of operations. It is way more straight forward, and I use this quite a lot. — Mad Matts, Jul 18 '16 at 10:22

vanje · Answer 1 · 2016-07-19T11:14:03.637

Change your method getdoc() so that it create only a Biblio object for 'docmain` types.

private static Biblio getdoc(Node node) {
  Biblio bib = null;
  if (node.getNodeType() == Node.ELEMENT_NODE) {
    Element element = (Element) node;
    String type = element.getAttribute("document-id-type");
    if(type != null && type.equals("docmain")) {
      bib = new Biblio();
      bib.setCountry(getTagValue("country",element));
      bib.setDocnumber(getTagValue("doc-number",element));
      bib.setDate(getTagValue("date",element));          
    }
  }
  return bib;
}

Then in your main method you should only add to the list, if getdoc() result is not null:

for (int i = 0; i < nodeList.getLength(); i++) {
  Biblio biblio = getdoc(nodeList.item(i));
  if(biblio != null) {
    docList.add(biblio);
  }
}

Update: Ok, this is horrible, sorry. You should really learn a little bit about XPath. I try to rewrite this using XPath expressions.

First we need four XPath expressions. One to extract a node list with all document-id elements with type docmain.

The XPath expression for this is: /maindata/publication-reference/document-id[@document-id-type='docmain'] (whole XML document in context).

Here the predicate in [] ensures, that only document-id elements with type docmain are extracted.

Then for each field in a document-id element (with document-id element as context):

country: country
docnumber: doc-number
date: date

We use a static initializer for that:

private static XPathExpression xpathDocId;
private static XPathExpression xpathCountry;
private static XPathExpression xpathDocnumber;
private static XPathExpression xpathDate;

static {
  try {
    XPath xpath = XPathFactory.newInstance().newXPath();
    // Context is the whole document. Find all document-id elements with type docmain
    xpathDocId = xpath.compile("/maindata/publication-reference/document-id[@document-id-type='docmain']");

    // Context is a document-id element. 
    xpathCountry = xpath.compile("country");
    xpathDocnumber = xpath.compile("doc-number");
    xpathDate = xpath.compile("date");
  } catch (XPathExpressionException e) {
    e.printStackTrace();
  }
}

Then we rewrite the method getdoc. This method now gets a document-id element as input and creates a Biblio instance out of it using XPath expressions:

private static Biblio getdoc(Node element) throws XPathExpressionException {
  Biblio biblio = new Biblio();
  biblio.setCountry((String) xpathCountry.evaluate(element, XPathConstants.STRING));
  biblio.setDocnumber((String) xpathDocnumber.evaluate(element, XPathConstants.STRING));
  biblio.setDate((String) xpathDate.evaluate(element, XPathConstants.STRING));
  return biblio;
}

Then in the main() method you use the XPath expression to extract only the needed elements:

  NodeList nodeList = (NodeList) xpathDocId.evaluate(doc, XPathConstants.NODESET);
  List<Biblio> docList = new ArrayList<Biblio>();
  for (int i = 0; i < nodeList.getLength(); i++) {
    docList.add(getdoc(nodeList.item(i)));
  }

Thanks, but in the Getdoc() method always getting empty value (string Type), so my output collection are empty — Prabu, Jul 18 '16 at 12:25

score 1 · Answer 2 · edited May 23 '17 at 11:58

The value could be retrieved with following XPath using the DOM and XPath API.

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = builder.parse(new File(...) );
    XPathFactory xPathfactory = XPathFactory.newInstance();
    XPath xpath = xPathfactory.newXPath();
    XPathExpression expr = xpath.compile("//document-id[@document-id-type=\"docmain\"]/doc-number/text()");
    String value = expr.evaluate(doc);

score 0 · Answer 3 · answered Jul 20 '16 at 10:11

0

thanks for the Help, following are the code

String Number = xPath.compile("//publication-reference//document-id[@document-id-type=\"docmain\"]/doc-number").evaluate(xmlDocument);

answered Jul 20 '16 at 10:11

Prabu

3,550
9
44
85

XML read same tag with different segment

3 Answers3