9

TASK : I have an existing xml document (UTF-8) which uses xml namespaces and xml schema. I need to parse to a particular element, append content (that also needs to use xml namespace prefixes) to this element and then write out the Document again.

which is the best XML parser library that I should be using for this TASK ?

I've seen a previous thread (Best XML parser for Java) but was not sure if dom4j or JDOM is any good for namespaces/xmlSchema and good support for UTF-8 characters.

Some parsers that seems like a task for
JDom
Dom4J
XOM
WoodStock

Any idea which one is the best ? :-) I use JDK 6 and would prefer NOT to use the built-in SAX/DOM facilities to do this job because that requires me to write too much code.

Would help to have some examples of doing such a task.

Community
  • 1
  • 1
anjanb
  • 12,999
  • 18
  • 77
  • 106
  • How is doing that with the built-in DOM facility going to be too much to code? Ah, right - Java... ;-) But seriously: is 15-20 lines too much code in your opinion? What would be acceptable then? – Thomas Mar 26 '10 at 13:11

4 Answers4

6

Use XSLT. Seriously. This is a perfect job for it. Just use a copy template to copy everything as is except for the place where you need to add more xml. You can even add the XML by actually writing XML instead of DOM manipulation.

This is the copy template:

<xsl:template match="node() | @*">
    <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
</xsl:template>

I know a lot of people hate XSLT, but this is a task where it would really shine and take almost no code. Also, you could just use what's in the JDK.

Russell Leggett
  • 8,795
  • 3
  • 31
  • 45
6

Using JDOM, taking an InputStream and making it a Document:

InputStream inputStream = (InputStream)httpURLConnection.getContent();
DocumentBuilderFactory docbf = DocumentBuilderFactory.newInstance();
docbf.setNamespaceAware(true);
DocumentBuilder docbuilder = docbf.newDocumentBuilder();
Document document = docbuilder.parse(inputStream, baseUrl);

At that point, you have the XML in a Java object. Done. Easy.

You can either use the document object and the Java API to just walk through it, or also use XPath, which I find easier (once I learned it).

Build an XPath object, which takes a bit:

public static XPath buildXPath() {
    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();
    xpath.setNamespaceContext(new AtomNamespaceContext());
    return xpath;
}


public class AtomNamespaceContext implements NamespaceContext {

    public String getNamespaceURI(String prefix) {
        if (prefix == null)
            throw new NullPointerException("Null prefix");
        else if ("a".equals(prefix))
            return "http://www.w3.org/2005/Atom";
        else if ("app".equals(prefix))
            return "http://www.w3.org/2007/app";
        else if ("os".equals(prefix))
            return "http://a9.com/-/spec/opensearch/1.1/";
        else if ("x".equals(prefix)) 
            return "http://www.w3.org/1999/xhtml";
        else if ("xml".equals(prefix))
            return XMLConstants.XML_NS_URI;
        return XMLConstants.NULL_NS_URI;
    }

    // This method isn't necessary for XPath processing.
    public String getPrefix(String uri) {
        throw new UnsupportedOperationException();
    }

    // This method isn't necessary for XPath processing either.
    public Iterator getPrefixes(String uri) {
        throw new UnsupportedOperationException();
    }
}

Then just use it, which (thankfully) doesn't take much time at all:

return Integer.parseInt(xpath.evaluate("/a:feed/os:totalResults/text()", document));
Dean J
  • 39,360
  • 16
  • 67
  • 93
  • 1
    +1 - JDOM is the easiest API to learn for this. XSLT will be a better choice if you have tasks like this often, though. – jsight Mar 26 '10 at 14:19
3

Since writing too much code is the main issue for you, you might want to consider jOOX:

http://code.google.com/p/joox/

I have created jOOX to be a port of jQuery to Java. The underlying technology is Java's standard DOM. Some sample code:

// Find the order at index for and add an element "paid"
$(document).find("orders").children().eq(4)
           .append("<paid>true</paid>");

// Find those orders that are paid and flag them as "settled"
$(document).find("orders").children().find("paid")
           .after("<settled>true</settled>");

// Add a complex element
$(document).find("orders").append(
  $("order", $("date", "2011-08-14"),
             $("amount", "155"),
             $("paid", "false"),
             $("settled", "false")).attr("id", "13");

Note: Namespaces are not yet explicitly supported, but you can work around that

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • jOOX is good ideas. But I lost time, because this technology do not supports manipulating with attributes. Without it, this technology is only suitable for reading. – wojand Nov 22 '12 at 13:56
  • @wojand: What makes you think so? jOOX allows for manipulation of attributes. See the third example in my answer, which sets `id="13"` – Lukas Eder Nov 22 '12 at 14:05
  • Show me how add attributes to existing tag. You can add a tag, but the problem is when you need to add attributes to existing tag. I could not find a simple solution. I did not find any example on jOOX page for this problem. Above your example append tag with attribute, but how to APPEND ONLY ONE attribute to ${} WITHOUT tag? – wojand Nov 28 '12 at 14:06
  • I'm not sure I understand. You can only add attributes to elements, why would you add an attribute to something "empty"? – Lukas Eder Nov 28 '12 at 16:10
  • You do not understand me. Your example show how to add tag which contains attribute to documents, but not show how to add one attribute to tag. Method "append" needs tags. I cannot add ONLY attributes using method "append". This leads to that I must re-write tag when I wants only to add an attribute. Maybe this problem is solved, but it is not described in your documentation and when I lost more time than I planned, then I moved my code to other library. – wojand Dec 10 '12 at 12:22
  • @wojand: It doesn't really matter if the tag to which an attribute is added was already contained in one or the other document. `attr(String, String)` just adds an attribute to the element that was previously matched... I'm sorry that the documentation is a bit scarce right now... – Lukas Eder Dec 10 '12 at 13:03
  • 1
    @gaurav: It just wraps `org.w3c.dom` and as such inherits the DOM's thread unsafety. – Lukas Eder Aug 20 '19 at 21:04
1

It sounds like you can write an xslt style sheet to do what you want.

Kevin
  • 30,111
  • 9
  • 76
  • 83