2

I am am getting a strange error in my DocumentBuilderFactory parser, the error is the following:

[Fatal Error] standard_000000_3.xml:1221888:48: The element type "tduid" must be terminated by the matching end-tag "</tduid>".
org.xml.sax.SAXParseException; systemId: file:/home/000000/new/standard_000000_3.xml; lineNumber: 1221888; columnNumber: 48; The element type "tduid" must be terminated by the matching end-tag "</tduid>".
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
    at com.company.batch.BatchReader.<init>(BatchReader.java:46)
    at com.company.batch.BatchFile.open(BatchFile.java:76)
    at application.Daemon.checkNewImportFiles(Daemon.java:385)
    at application.Daemon.startApplication(Daemon.java:68)
    at application.Daemon.run(Daemon.java:36)
[Fatal Error] standard_000000_9.xml:1049516:32: XML document structures must start and end within the same entity.
org.xml.sax.SAXParseException; systemId: file:/home/000000/new/standard_XXXXXX_9.xml; lineNumber: 1049516; columnNumber: 32; XML document structures must start and end within the same entity.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
    at com.company.batch.BatchReader.<init>(BatchReader.java:46)
    at com.company.batch.BatchFile.open(BatchFile.java:76)
    at application.Daemon.checkNewImportFiles(Daemon.java:385)
    at application.Daemon.startApplication(Daemon.java:68)
    at application.Daemon.run(Daemon.java:36)

I know that these are usually errors of the structure of the xml, but is not this the case, I tried to validate it with xmllist and strange thing, if I create a java application with only the class that is managing the file and inserting in the main only the call to open it, change 2-3 attributes and save it, it is working, and the error is not happening.

I thought it may be a problem of memory so I tried to run the process monitoring the amount of memory used, now on the system the maximum java memory is 900mb and the program is not requiring more than 400.

An example of the xml file at the point in which is happening the error (first error, the error is happening on the starting tag <onPurchase>):

<transaction>
  <eventId>123456</eventId>
  <orderNumber>TEST_ORDER</orderNumber>
  <orderValue>0</orderValue>
  <currency>USD</currency>
  <tduid>testtesttesttesttesttesttesttest</tduid>
  <timestamp>2016-03-05 15:23:00 GMT</timestamp>
  <extraReportingInfo>
    <isUniveralStoreNewPurchaser>True</isUniveralStoreNewPurchaser>
    <onEntry>
      <productType>Test product</productType>
      <tuner>TEST TUNER</tuner>
      <userOs>Linux[2.0.10340.184]</userOs>
      <userDevice>Linux.Ubuntu</userDevice>
    </onEntry>
    <onPurchase>
      <productType>Test Product</productType>
      <tuner>TEST TUNER</tuner>
      <contentOwnership>TST</contentOwnership>
      <userDevice>Unknown</userDevice>
    </onPurchase>
  </extraReportingInfo>
  <reportInfo>
    <item>
      <productNumber>PRODUCT_NUMBER</productNumber>
      <productName>PRODUCT_NAME</productName>
      <price>0</price>
      <quantity>1</quantity>
    </item>
  </reportInfo>
</transaction>

Following the code that is managing to load the xml file:

public BatchReader(String Filename) {
    try {
    this.filename = Filename;
    File XMLFile = new File(Filename);
    DocumentBuilderFactory DBFactory = DocumentBuilderFactory.newInstance();

    DocumentBuilder DBBuilder = DBFactory.newDocumentBuilder();
    DBFactory.setValidating(true);
    this.doc = DBBuilder.parse(XMLFile);
    
    // http://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work
    this.doc.getDocumentElement().normalize();
    
    // Added for debug
    System.out.println(XMLFile.getAbsolutePath());

    // Setting the batch type
    this.Type = "standard";

    
    this.organizationId = Integer.parseInt(this.getString("organizationId", this.doc.getDocumentElement()));

    
    this.Sequence = (this.getString("sequenceNumber", this.doc.getDocumentElement()) != null) ? Integer.parseInt(this.getString("sequenceNumber", this.doc.getDocumentElement())) : 0;
    this.checksum = (this.getString("checksum", this.doc.getDocumentElement()) != null) ? this.getString("checksum", this.doc.getDocumentElement()) : null;
    
    //this.checksum = this.getString("checksum", this.doc.getDocumentElement());

    
    this.txAmount = this.doc.getElementsByTagName("transaction").getLength();

    } catch ( NullPointerException | NumberFormatException e) {
        e.printStackTrace();
    } catch (ParserConfigurationException | SAXException | IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

The error happens on this line:

this.doc = DBBuilder.parse(XMLFile);

The file is 73 Mb.
I sincerely don't know where else looking for the problem,
could you help me please?


Edit:

In a file I think I found the error, in a line: <productName>ПÑтница!</productName>.

This seems to have been codified in ansi, instead of UTF-8.

I'm gonna check in the other files, if I find something similar.

peterh
  • 11,875
  • 18
  • 85
  • 108
  • Just a wild guess here but my intuition tells me that you've got a < or > somewhere inside a tag in your xml file. Are you escaping the text contents of your xml? – Neil Mar 07 '16 at 10:05
  • Hi @Neil, thanks for asking. If it were like that it should be signaled by xmllint: [root@test error]# xmllint --noout standard_000000_3.xml; echo $? 0 – Emanuele Graziano Mar 07 '16 at 10:28
  • Have you tried simply jumping to line 1221888 and seeing what's there? If the tag that starts there also ends correctly? You assume xmllint would catch it, but I think that's a dangerous assumption in this case. Assume nothing and you'll fix it faster. – Neil Mar 07 '16 at 10:53
  • Hi @Neil, already did it and it was correct, i think that i got the problem (i edited the answer). Thanks! – Emanuele Graziano Mar 07 '16 at 11:38
  • Maybe you should answer your own question with the solution you found to be valid and resolve this? – razvanone Oct 06 '16 at 11:27

0 Answers0