0

I have a parser which parses and collects the requires fields and constructs a object out of it. Suppose if xml is like below

<xml>
<p1>
...
...
</p1>
<p2>
...
</p2>
...
...
</xml>

My java code parses it and code seems like below.

for each product //p1,p2 etc..
 print start time
 parse that node, which returns a object
 print end time
 add the object to list.

The sample code is below

products = (NodeList) xPath.evaluate("/xml/product",pxml,XPathConstants.NODESET);
for (int i = 0; i < products.getLength(); i++)
            {
                System.out.println("parsing product ::"+i+":" + (System.currentTimeMillis()-time));
                BookDataInfo _parsedPoduct = ParseProduct(products.item(i));
                System.out.println("parsing product finished ::"+i+":" + (System.currentTimeMillis()-time));
                if (_parsedPoduct.getParsingSucceeded())
                {
                    pparsedProducts.add(_parsedPoduct);
                }
            }

I have printed the times before parsing the node and after that, the time is exponentially increasing with no.of products like for 1st product takes 100ms where as some 300th product takes 2000ms. In each case same part of code is executed for parsing. Could any one have idea why it happens?

I can't post the code what parseproduct is doing but found out where the time is consumed most.

private NodeList getNodelist(Node xml, String Name)
{
    long time = System.currentTimeMillis();
    System.out.println("Nodelist start::" + (System.currentTimeMillis() - time));
    NodeList nodes = (NodeList)xPath.evaluate(Name,xml,XPathConstants.NODESET);
    System.out.println("Nodelist end::" + (System.currentTimeMillis() - time));
    return nodes;
}

similarly for getting node value at a stmt Node node = (Node)xPath.evaluate(Name,xml,XPathConstants.NODE);

here xPath is a static object of type XPath. when multiple times the above function is called for a product, the later calls are taking much time, like in start it took 2/3 ms but later(say product 300) it took 55-60ms for each call.

May I am missing some thing here? Thanks!

Mahesh
  • 99
  • 1
  • 5
  • 2
    Kind of hard to have an idea if we don't have the code, nor sample XSD content... – Simon Verhoeven Nov 07 '13 at 07:22
  • What data structure are you using to store the products? – Evans Nov 07 '13 at 07:22
  • 1
    My idea is you are reinventing the wheel. What you should try instead is to use some standard or at least wide spread library for parsing / transforming or accessing your xml. – Matthias Nov 07 '13 at 07:28
  • I am using DOM parser and XPATH to get the required nodes. – Mahesh Nov 07 '13 at 07:38
  • Maybe you scan the document from the beginning for each element, so if you are extracting the tenth you are fast-forwarding nine elements, and if you are processing the 100th, you are fast-forwarding 99. – Alexander Torstling Nov 07 '13 at 07:40
  • Data structure is java ArrayList. – Mahesh Nov 07 '13 at 07:40
  • In each parse I am sending the node to be parsed as argument, so I hope fast-forwarding is not happening, I edited the post with sample code aswell. – Mahesh Nov 07 '13 at 07:49
  • Your timing code makes no sense. You should decide to either, referring to the *current time* or referring to the *elapsed time*. But your code refers to the elapsed time since an arbitrary point hold in the variable `time` which never gets updated.   Measure elapsed time for an operation like this: `long startTimeRef=System.nanoTime(); operation(); long elapsedTime=System.nanoTime()-startTimeRef;` Btw. if `ParseProduct` really has a performance problem you can’t analyse it without looking at it so without posting the code of that method no one can tell you if there’s something wrong with it. – Holger Nov 07 '13 at 08:07
  • A technicality perhaps: your data demonstrates that the time is increasing, but not that it is increasing exponentially. It's much more likely that it is increasing quadratically. It can be useful to know the difference, because it will help identify the cause. – Michael Kay Nov 07 '13 at 10:24
  • Updated the question with more info, May be it will give you more details for understanding the issue. – Mahesh Nov 07 '13 at 12:01
  • The problem is solved. The main issue is the one mentioned in below link. http://stackoverflow.com/questions/3782618/xpath-evaluate-performance-slows-down-absurdly-over-multiple-calls Followed the steps mentioned in that, it drastically reduced the time consumed. – Mahesh Nov 07 '13 at 12:48

2 Answers2

0

check out the difference between DOM and SAX parsing, DOM lets you query the XML file but have to upload entire document to memory for that, if you just want to create objects you better use SAX parser

Guy Gavriely
  • 11,228
  • 6
  • 27
  • 42
0

The problem is solved. The main issue is the one mentioned in below link. XPath.evaluate performance slows down (absurdly) over multiple calls

Followed the steps mentioned in that, it drastically reduced the time consumed.

Community
  • 1
  • 1
Mahesh
  • 99
  • 1
  • 5