1

I am wondering if there are APIs or open source jar that can extract a subset of XML based on a given path.

For example: I have an XML which is a skeleton (yin model, which is converted from yang model)

<xml .....>
<data>
   <model1>
     <element1>
        <id />
        <name />
        <address />
     </element1>
   </model1>
   <model2>
     <element2>
        <uid />
        <something />
     </element2>
   </model2>
   ....
</data>

a given path:

data/model1/element1[id='1']/name  and  name value is 'John'

and I want the following to be returned

<xml .....>
<data>
   <model1>
     <element1>
        <id>1</id>
        <name>John</name>
     </element1>
   </model1>
<data>

I am not quite sure what keywords to search for. Hopefully, someone knows XML well enough could give suggestions.

Another question is if there's no existing API or open source, what would be the best way to handle this? Should I use DOM as I need the whole (tree) structure from my skeleton? Besides DOM is using too much memory, what are the other side effects?

asun
  • 151
  • 5
  • 15
  • There are many ways you could do this. Is your goal to read the data and do something with it, or to rewrite it? – teppic Apr 07 '18 at 08:14
  • @teppic my goal is to use the given path to get the portion of the xml from the skeleton and then put the value in. – asun Apr 07 '18 at 08:31
  • 1
    You can extract `` quite easily with XPath like `data/model1[element1[id='1' and name='John']'`. Thres an XPath API in Java (see [this question](https://stackoverflow.com/questions/2811001/how-to-read-xml-using-xpath-in-java)). – lexicore Apr 07 '18 at 09:30
  • 1
    If you want it with ``, it's a bit more complex. You can put it together using DOM or apply an XSLT. There are really many many ways to achieve what you want. I'd suggest you checking the XPath approach first of all. – lexicore Apr 07 '18 at 09:32
  • @lexicore I want to get the complete xml from the xpath because my skeleton will not have data only elements. Are you saying if I use xpath without id = 1 and name = john, i can get the output like what I indicated in my description? – asun Apr 07 '18 at 19:23
  • bascially I want the subset of the skeleton XML. the skeleton XML only has elements and attributes/tree structure. I want to get the subset of the skeleton XML based on the path first then I can put value/text into the necessary element – asun Apr 07 '18 at 19:25

1 Answers1

1

You can use the builtin package javax.xml to read and write data. You can query the XML using XML path language (XPath). For example, extracting the subtree of <element1>:

/data/model1/element1

Or extracting the subtree of <element1> where child-elements <id> has text "1":

/data/model1/element1[id/text() = 1] 

I wrote a small program to demonstrate the usage. You need to

  • create a org.w3c.dom.Document
  • parse the XML content into this object
  • compile your XPath expression
  • extract the document using the compiled xpath as a NodeList
  • export the NodeList or do any other desired tasks.

You can compile the program and run as follows:

$ javac Demo.java
$ java Demo /data/model1/element1
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<data>
  <model1>
    <element1>

      <id>1</id>

      <name>John</name>

      <address>xxx</address>

    </element1>
    <element1>

      <id>2</id>

      <name>Tom</name>

      <address>yyy</address>

    </element1>
  </model1>
</data>

~ $ java Demo '/data/model1/element1[id/text() = 1]'
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<data>
  <model1>
    <element1>

      <id>1</id>

      <name>John</name>

      <address>xxx</address>

    </element1>
  </model1>
</data>

The full program:

import java.io.*;
import java.nio.charset.StandardCharsets;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.*;
import org.w3c.dom.*;

public class Demo {

  private static final String XML =
      "<?xml version=\"1.0\"?>\n"
          + "<data>\n"
          + "  <model1>\n"
          + "    <element1>\n"
          + "      <id>1</id>\n"
          + "      <name>John</name>\n"
          + "      <address>xxx</address>\n"
          + "    </element1>\n"
          + "    <element1>\n"
          + "      <id>2</id>\n"
          + "      <name>Tom</name>\n"
          + "      <address>yyy</address>\n"
          + "    </element1>\n"
          + "  </model1>\n"
          + "  <model2>\n"
          + "    <element2>\n"
          + "      <uid />\n"
          + "      <something />\n"
          + "    </element2>\n"
          + "  </model2>"
          + "</data>";

  public static void main(String[] args) throws Exception {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    Document source;
    try (InputStream in = new ByteArrayInputStream(XML.getBytes(StandardCharsets.UTF_8))) {
      source = factory.newDocumentBuilder().parse(in);
    }

    // Extract
    XPath xPath = XPathFactory.newInstance().newXPath();
    XPathExpression expr = xPath.compile(args[0]);

    NodeList nodeList = (NodeList) expr.evaluate(source, XPathConstants.NODESET);

    // Export
    Document target = factory.newDocumentBuilder().newDocument();
    Element data = target.createElement("data");
    Element model1 = target.createElement("model1");
    data.appendChild(model1);
    target.appendChild(data);
    for (int i = 0; i < nodeList.getLength(); i++) {
      Node node = nodeList.item(i);
      Node newNode = target.importNode(node, true);
      model1.appendChild(newNode);
    }
    System.out.println(getStringFrom(target));
  }

  private static String getStringFrom(Document doc) throws TransformerException {
    DOMSource domSource = new DOMSource(doc);
    StringWriter writer = new StringWriter();
    StreamResult result = new StreamResult(writer);
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer transformer = tf.newTransformer();
    // set indent
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
    transformer.transform(domSource, result);
    return writer.toString();
  }
}
Mincong Huang
  • 5,284
  • 8
  • 39
  • 62
  • That's pretty cool. However, if the xml variable is like \n\n\n\n\n\n
    \n
    \n
    then would xpath still returns the subtree?
    – asun Apr 08 '18 at 06:21
  • 1
    In this case, the root node is no longer `` by ``. So the XPath should be changed. You should use either `"/xml/data/model1/element1"` to describe the complet path, or `//data/model1/element1"` (two slashes) to select all the descending nodes. @asun – Mincong Huang Apr 08 '18 at 06:54