3

I am using xquery to query a large xml document. Using xquery doc function will not cause memory heap outbound? How to use xquery in java to query a large xml file. Explanation with example will be appriciated .

programmer
  • 101
  • 1
  • 9
  • Large XML files are poor candidates for XQuery. A better option is to stream the XML file through a SAX parser to selectively search for the object. Performance-wise, a 300MB file can be navigated and updated in a few seconds using SAX. – Compass Jun 19 '17 at 19:26
  • If the query is sufficiently simple to process using SAX, then it is probably amenable to processing using a streaming XQuery processor, which will be just as fast because the performance is dominated by XML parsing. The benefit is that the XQuery solution will be 3 lines of code rather than 300 lines for the SAX solution. – Michael Kay Jun 21 '17 at 10:48

2 Answers2

2

First of all 150 MB is not that huge, considering how powerful today's machines are. If it grows to GBs consider Stax or SAX instead.

XPath/Xquery resource usage will be dependent on the implementation, For Example, in case of Dom4J, Comparing to DOM, XPath/Xquery is often significantly less resource heavy, but this often depends on various other factors like length of the document (i.e. how many 'childNode' elements you have) and the location in the document of the data in which you are interested.

quote from here https://stackoverflow.com/a/725007/6785908

XPath memory usage and completion time tends to increase the further down the document you go. For example, let's say you have an XML document with 20,000 childNode elements, each childNode has a unique identifier that you know in advance, and you want to extract a known childNode from the document. Extracting the 18,345th childNode would use much, much, much more memory than extracting the 3rd.

So if you are using XPath to extract all childNode elements, you may find it less efficient than parsing into a DOM. XPath is generally an easy way of extracting a portion of an XML doucment. I'd not recommend using it for processing all of an XML document.

Spring Xquery Examples

https://github.com/spring-projects/spring-integration-extensions/tree/master/samples/xquery

Example of Xquery using Java

This is what I got from first google search result https://docs.oracle.com/database/121/ADXDK/adx_j_xqj.htm#ADXDK115

import javax.xml.xquery.XQConnection;
import javax.xml.xquery.XQException;
import javax.xml.xquery.XQPreparedExpression;
import javax.xml.xquery.XQSequence;

import oracle.xml.xquery.OXQDataSource;

public class HelloWorld {

    public static void main(String[] args) throws XQException {
        OXQDataSource ds = new OXQDataSource();
        XQConnection con = ds.getConnection();
        String query = "<hello-world>{1 + 1}</hello-world>";
        XQPreparedExpression expr = con.prepareExpression(query);
        XQSequence result = expr.executeQuery();

        // prints "<hello-world>2</hello-world>"
        System.out.println(result.getSequenceAsString(null));

        result.close();
        expr.close();
        con.close();
    }

} 

I want to reiterate that, for a 150 MB sized xml processing, you shouldn't worry too much about the memory footprint.

so-random-dude
  • 15,277
  • 10
  • 68
  • 113
  • The post you cite regarding XPath performance is specifically about the DOM4J product, and cannot be extrapolated to other implementations. – Michael Kay Jun 21 '17 at 10:51
  • @MichaelKay.. Thanks for helping me correct my mistake.. I overlooked the fact that it was for a particular implementation, mea kulpa! Thanks. – so-random-dude Jun 22 '17 at 04:54
1

150Mb is not vast nowadays, and a decent XQuery processor should be able to handle it in memory. It's very difficult to give general answers to this question without knowing what XQuery processor you intend to use.

Beyond that, it depends very much what the query is doing (which you haven't told us).

For join queries, getting acceptable performance will depend on how good the optimizer in your XQuery processor is.

Some queries will benefit greatly from a technique called "document projection" which analyses the query to determine which parts of the document are needed, and avoids allocating memory to those parts of the tree that are not accessed by the query. Check whether your XQuery processor supports this technique. (Saxon does, for example, but only in Saxon-EE, and it's not the default).

Furthermore, some queries may be streamable, which means there is no need to build a tree in memory at all. Again, check whether your chosen XQuery processor supports streaming. Saxon does - again only in Saxon-EE, and you have to request it with an option on the command line.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • The xml will be like this 123ERRR 4487665 12\23\2016 there will be several row like this .what i need to do is to sort out the row there updated dt is latest against a same mobile no – programmer Jun 22 '17 at 04:03
  • Could you write that again, please, this time in English? – Michael Kay Jun 22 '17 at 14:03