3

So I have a project where I need to pull an xml file that is updated every 5 minutes. However I'm designing my program to pull this file every time it updates.

The data structure of the xml file is like this...

<m:REPORT_DATA>
    <m:DATA_ITEM>ENC</m:DATA_ITEM>
    <m:RESOURCE_NAME>DLAP</m:RESOURCE_NAME>
    <m:OPR_DATE>2012-06-02</m:OPR_DATE>
    <m:INTERVAL_NUM>1</m:INTERVAL_NUM>
    <m:VALUE>16.77734</m:VALUE>
</m:REPORT_DATA>
<m:REPORT_DATA>
    <m:DATA_ITEM>ENC</m:DATA_ITEM>
    <m:RESOURCE_NAME>DLAP</m:RESOURCE_NAME>
    <m:DATE>2012-06-02</m:OPR_DATE>
    <m:INTERVAL_NUM>2</m:INTERVAL_NUM>
    <m:VALUE>16.77739</m:VALUE>
</m:REPORT_DATA>
....

Assuming that I pull it for the 200th time that day, how would I grab just the last value

"<m:VALUE>16.77739</m:VALUE>"

And get that value for my database?

I'm torn about using Sax, Xpath, or DOM. Some help would be amazing.

A_Elric
  • 3,508
  • 13
  • 52
  • 85

5 Answers5

4

If you had a root, lets say <m:REPORTS>, finding the last VALUE using XPath would be rather simple:

    XPathFactory f = XPathFactory.newInstance() ;
    XPath x = f.newXPath() ;
    try {
        InputSource source = new InputSource(new FileInputStream("logfile.xml")) ;
        XPathExpression expr = x.compile("//REPORT_DATA[DATA_ITEM='ENC'][last()]/VALUE/text()") ;
        String s = expr.evaluate(source) ;
        System.out.println("Last value: " + s ) ;
    }
    catch(Throwable t) {
        System.err.println("Error: " + t) ;
    }
mazaneicha
  • 8,794
  • 4
  • 33
  • 52
  • Would there be some way to do that where the data item = ENC & report_data[last()]/VALUE was still in place? – A_Elric Jul 03 '12 at 18:58
  • I am not sure I totally understood your comment, but I updated the code to search for the last REPORT_DATA having DATA_ITEM as "ENC". – mazaneicha Jul 03 '12 at 20:13
4

This isn't a well formed xml, You can use xpath to find last node, for example //REPORT_DATA[position() = last()] returns last REPORT_DATA node and for reading xml using xpath see How to read XML using XPath in Java

//REPORT_DATA[last()]/DATA_ITEM[text()="ENC"]

and this returns node that it's DATA_ITEM equals to "ENC"

or //REPORT_DATA[last()]/VALUE[text()="ENC"]

Community
  • 1
  • 1
Pooya
  • 4,385
  • 6
  • 45
  • 73
0

This is not an "XML File" in the sense that it is not well-formed, since it has no root element (or it has multiple root elements). As such it cannot be loaded directly by an XML library, so you cannot use DOM, XPath, or XSLT.

You are better off using some simple pattern matching to detect the start of each segment, find the last segment, and then load only that segment into a DOM for extraction.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
  • You are correct that this is not well-formed xml, but it would be simple enough to add top level tags to his input stream. So, I do not think this rules out the use of xml technologies. – Colin D Jul 03 '12 at 18:52
  • It does, I just didn't think it was important to include them since I said it was normal xml. There's about 50 more lines of header, but trust me when I tell you that the xml is compliant, well formed, and handled by an agency that makes it complient with any way that I have tried. My major issue has been in finding out how to filter on Data items of type ENC where it's the last value. – A_Elric Jul 03 '12 at 18:53
  • 1
    Why parse the entire log file, which could be hundreds of megabytes, just to process the last entry? That could end up being hideously wasteful. – Jim Garrison Jul 03 '12 at 18:53
  • Because the xml file maxes out at about 278k and after that it rolls over to a new day. When I pull the file it first checks to see if the file exists and simply deletes the old one before saving it locally. – A_Elric Jul 03 '12 at 18:57
  • And also, it's not my file, I am pulling from a large corporation that puts this data out, so format, and what records I'm pulling is well out of my hands. – A_Elric Jul 03 '12 at 18:58
0

Use SAX.

With either xpath or DOM, you have to build a DOM which is slow and memory expensive, especially for 1 lookup.

SAX is faster, but is going to require you to keep track of your place and state, which in your case should be easy. Just look for your REPORT_DATA element, gather up its encapsulated data and if it is the last one (end document reached), you have your output.

Colin D
  • 5,641
  • 1
  • 23
  • 35
0
    //filePath the path to the file you want to parse, tag  the tag of the node you want to search.    
public static String getLastNode(String filePath, String tag) throws             ParserConfigurationException, SAXException, IOException{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = docBuilder.parse(filePath);

    return doc.getElementsByTagName(tag).item(doc.getElementsByTagName(tag).getLength()-1).getTextContent();
//if you don't care about specific tag name just use :
//return doc.getLastChild().getTextContent;


}
tk66
  • 274
  • 7
  • 20