4

For one of our applications, I've written a utility that uses java's DOM parser. It basically takes an XML file, parses it and then processes the data using one of the following methods to actually retrieve the data.

getElementByTagName()
getElementAtIndex()
getFirstChild()
getNextSibling()
getTextContent()

Now i have to do the same thing but i am wondering whether it would be better to use an XSLT stylesheet. The organisation that sends us the XML file keeps changing their schema meaning that we have to change our code to cater for these shema changes. Im not very familiar with XSLT process so im trying to find out whether im better of using XSLT stylesheets rather than "manual parsing".

The reason XSLT stylesheets looks attractive is that i think that if the schema for the XML file changes i will only need to change the stylesheet? Is this correct?

The other thing i would like to know is which of the two (XSLT transformer or DOM parser) is better performance wise. For the manual option, i just use the DOM parser to parse the xml file. How does the XSLT transformer actually parse the file? Does it include additional overhead compared to manually parsing the xml file? The reason i ask is that performance is important because of the nature of the data i will be processing.

Any advice?

Thanks

Edit

Basically what I am currently doing is parsing an xml file and process the values in some of the xml elements. I don't transform the xml file into any other format. I just extract some value, extract a row from an Oracle database and save a new row into a different table. The xml file I parse just contains reference values I use to retrieve some data from the database.

Is xslt not suitable in this scenario? Is there a better approach that I can use to avoid code changes if the schema changes?

Edit 2

Apologies for not being clear enough about what i am doing with the XML data. Basically there is an XML file which contains some information. I extract this information from the XML file and use it to retrieve more information from a local database. The data in the xml file is more like reference keys for the data i need in the database. I then take the content i extracted from the XML file plus the content i retrieved from the database using a specific key from the XML file and save that data into another database table.

The problem i have is that i know how to write a DOM parser to extract the information i need from the XML file but i was wondering whether using an XSLT stylesheet was a better option as i wouldnt have to change the code if the schema changes.

Reading the responses below it sounds like XSLT is only used for transorming and XML file to another XML file or some other format. Given that i dont intend to transform the XML file, there is probably no need to add the additional overhead of parsing the XSLT stylesheet as well as the XML file.

ziggy
  • 15,677
  • 67
  • 194
  • 287
  • XSLT is used to transform an XML document into another (XML / HTML / text) document. It's not used to parse and get access to the contents of a document. What does your DOM parser do? – JB Nizet Feb 21 '11 at 19:01
  • 2
    I don't think that the requeriments are well described in order to not be a subjective question. Small remark: **traversing** (not parsing) a three with low level DOM methods could be faster than high level language (like XSLT); design and update a low level traversal could be harder and complex than high level language (like XSLT). If after the incomming data processing, another XML tree must be built, again, low level methods could be faster but harder for maintenance and update. Plus we would be meeting the specific XSLT field... –  Feb 21 '11 at 19:07
  • @Alejandro +1. You should really post this as an answer. – Flack Feb 21 '11 at 19:45

4 Answers4

4

Transforming XML documents into other formats is XSLT's reason for being. You can use XSLT to output HTML, JSON, another XML document, or anything else you need. You don't specify what kind of output you want. If you're just grabbing the contents of a few elements, then maybe you won't want to bother with XSLT. For anything more, XSLT offers an elegant solution. This is primarily because XSLT understands the structure of the document it's working on. Its processing model is tree traversal and pattern matching, which is essentially what you're manually doing in Java.

You could use XSLT to transform your source data into the representation of your choice. Your code will always work on this structure. Then, when the organization you're working with changes the schema, you only have to change your XSLT to transform the new XML into your custom format. None of your other code needs to change. Why should your business logic care about the format of its source data?

Wayne
  • 59,728
  • 15
  • 131
  • 126
  • 1
    Another reason to use XSLT is to decouple your code from an actual file format, which is quite handy when you have no control over the format. (as is often the case.) – biziclop Feb 21 '11 at 19:40
3

You are right that XSLT's processing model based on a rule-based event-driven approach makes your code more resilient to changes in the schema.

Because it's a different processing model to the procedural/navigational approach that you use with DOM, there is a learning and familiarisation curve, which some people find frustrating; if you want to go this way, be patient, because it will be a while before the ideas click into place. Once you are there, it's much easier than DOM programming.

The performance of a good XSLT processor will be good enough for your needs. It's of course possible to write very inefficient code, just as it is in any language, but I've rarely seen a system where XSLT was the bottleneck. Very often the XML parsing takes longer than the XSLT processing (and that's the same cost as with DOM or JAXB or anything else.)

As others have said, a lot depends on what you want to do with the XML data, which you haven't really explained.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
1

I think that what you need is actually an XPath expression. You could configure that expression in some property file or whatever you use to retrieve your setup parameters.

In this way, you'd just change the XPath expression whenever your customer hides away the info you use in yet another place.

Basically, an XSLT is an overkill, you just need an XPath expression. A single XPath expression will allow to home in onto each value you are after.

Update

Since we are now talking about JDK 1.4 I've included below 3 different ways of fetching text in an XML file using XPath. (as simple as possible, no NPE guard fluff I'm afraid ;-)

Starting from the most up to date.

0. First the sample XML config file

<?xml version="1.0" encoding="UTF-8"?>
<config>
    <param id="MaxThread" desc="MaxThread"        type="int">250</param>
    <param id="rTmo"      desc="RespTimeout (ms)" type="int">5000</param>
</config>

1. Using JAXP 1.3 standard part of Java SE 5.0

import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;

public class TestXPath {

    private static final String CFG_FILE = "test.xml" ;
    private static final String XPATH_FOR_PRM_MaxThread = "/config/param[@id='MaxThread']/text()";
    public static void main(String[] args) {

        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        docFactory.setNamespaceAware(true);
        DocumentBuilder builder;
        try {
            builder = docFactory.newDocumentBuilder();
            Document doc = builder.parse(CFG_FILE);
            XPathExpression expr = XPathFactory.newInstance().newXPath().compile(XPATH_FOR_PRM_MaxThread);
            Object result = expr.evaluate(doc, XPathConstants.NUMBER);
            if ( result instanceof Double ) {
                System.out.println( ((Double)result).intValue() );
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

2. Using JAXP 1.2 standard part of Java SE 1.4-2

import javax.xml.parsers.*;
import org.apache.xpath.XPathAPI;
import org.w3c.dom.*;

public class TestXPath {

    private static final String CFG_FILE = "test.xml" ;
    private static final String XPATH_FOR_PRM_MaxThread = "/config/param[@id='MaxThread']/text()";

    public static void main(String[] args) {

        try {
            DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
            docFactory.setNamespaceAware(true);
            DocumentBuilder builder = docFactory.newDocumentBuilder();
            Document doc = builder.parse(CFG_FILE);
            Node param = XPathAPI.selectSingleNode( doc, XPATH_FOR_PRM_MaxThread );
            if ( param instanceof Text ) {
                System.out.println( Integer.decode(((Text)(param)).getNodeValue() ) ); 
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

3. Using JAXP 1.1 standard part of Java SE 1.4 + jdom + jaxen

You need to add these 2 jars (available from www.jdom.org - binaries, jaxen is included).

import java.io.File;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;

public class TestXPath {

    private static final String CFG_FILE = "test.xml" ;
    private static final String XPATH_FOR_PRM_MaxThread = "/config/param[@id='MaxThread']/text()";

    public static void main(String[] args) {
        try {
            SAXBuilder sxb = new SAXBuilder();
            Document doc = sxb.build(new File(CFG_FILE));
            Element root = doc.getRootElement();
            XPath xpath = XPath.newInstance(XPATH_FOR_PRM_MaxThread);
            Text param = (Text) xpath.selectSingleNode(root);
            Integer maxThread = Integer.decode( param.getText() );
            System.out.println( maxThread );
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
Alain Pannetier
  • 9,315
  • 3
  • 41
  • 46
  • It's not merely 'overkill', it's the wrong tool because XSLT is for creating output directly. – Jesse Millikan Feb 21 '11 at 20:36
  • 1
    @Jesse, 99% right ! However I left it open because arguably you can use a ByteArrayOutputStream as a transformation output. So that you get the result in a string. This could have been ziggy's intention. I use this technique a lot when I want to optimise a stylesheet and replace a time costly template by a custom extension. When you are in pre-prod, you feed your comparator with a large representative sample of input and you compare the result of your extension with the result of the legacy all this is strings. – Alain Pannetier Feb 21 '11 at 20:45
  • @Alain Pannetier: +1 I agree now that the question was clarified: there is no need for an intermediate format for traversed input source, and XPath will be always more fexible than low level DOM methods. –  Feb 22 '11 at 12:31
  • Is the xpath approach possible in jdk 1.4? – ziggy Feb 23 '11 at 13:54
  • @ziggy. For JDK 1.4-2 (which shipped with jaxp 1.2), yes. For JDK 1.4 bare, JAXP version was 1.1. So you need to add jdom (which needs jaxen - provided). But nothing forbids to add jaxp 1.3 on top of JDK 1.4 and this makes it equivalent to JDK 1.5. I've updated the answer with some minimalist code for these three configurations. Please feel free to ask further detail. – Alain Pannetier Feb 23 '11 at 17:43
  • @Alain, I tried options 2 and 3 above as you suggested. For option 2, i couldnt import the org.apache.xpath.XPathAPI package. It was coming up with the error "Access restriction: The type XPathAPI is not accessible due to restriction on required library C:\Java\j2sdk1.4.1_07\jre\lib\rt.jar". I looked around on Google and it appears i have to play around with the libraries in the JVM to get it to work. I decided to go with option 3 which works flawlessly. I put the xpath expressions on to a properties file and use a generic xml utility class which is used by all the parsers. – ziggy Mar 01 '11 at 10:20
  • Option 2 is for 1.4-**2**. and you seem to be on 1.4.1_07. Which would explain why ot does not work. I thought you might be in 1.4-2 but also include option 3 just in case. With hindsight, that was the best option for you... Well done to you. – Alain Pannetier Mar 01 '11 at 10:27
0

Since performance is important, I would suggest using a SAX parser for this. JAXB will give you roughly the same performance as DOM parsing PLUS it will be much easier and maintainable. Handling the changes in the schema also should not affect you badly if you are using JAXB, just get the new schema and regenerate the classes. If you have a bridge between the JAXB and your domain logic, then the changes can be absorbed in that layer without worrying about XML. I prefer treating XML as just a message that is used in the messaging layer. All the application code should be agnostic of XML schema.

rahulmohan
  • 1,285
  • 11
  • 19