I am am getting a strange error in my DocumentBuilderFactory parser, the error is the following:
[Fatal Error] standard_000000_3.xml:1221888:48: The element type "tduid" must be terminated by the matching end-tag "</tduid>".
org.xml.sax.SAXParseException; systemId: file:/home/000000/new/standard_000000_3.xml; lineNumber: 1221888; columnNumber: 48; The element type "tduid" must be terminated by the matching end-tag "</tduid>".
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
at com.company.batch.BatchReader.<init>(BatchReader.java:46)
at com.company.batch.BatchFile.open(BatchFile.java:76)
at application.Daemon.checkNewImportFiles(Daemon.java:385)
at application.Daemon.startApplication(Daemon.java:68)
at application.Daemon.run(Daemon.java:36)
[Fatal Error] standard_000000_9.xml:1049516:32: XML document structures must start and end within the same entity.
org.xml.sax.SAXParseException; systemId: file:/home/000000/new/standard_XXXXXX_9.xml; lineNumber: 1049516; columnNumber: 32; XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
at com.company.batch.BatchReader.<init>(BatchReader.java:46)
at com.company.batch.BatchFile.open(BatchFile.java:76)
at application.Daemon.checkNewImportFiles(Daemon.java:385)
at application.Daemon.startApplication(Daemon.java:68)
at application.Daemon.run(Daemon.java:36)
I know that these are usually errors of the structure of the xml, but is not this the case, I tried to validate it with xmllist and strange thing, if I create a java application with only the class that is managing the file and inserting in the main only the call to open it, change 2-3 attributes and save it, it is working, and the error is not happening.
I thought it may be a problem of memory so I tried to run the process monitoring the amount of memory used, now on the system the maximum java memory is 900mb and the program is not requiring more than 400.
An example of the xml file at the point in which is happening the error (first error, the error is happening on the starting tag <onPurchase>):
<transaction>
<eventId>123456</eventId>
<orderNumber>TEST_ORDER</orderNumber>
<orderValue>0</orderValue>
<currency>USD</currency>
<tduid>testtesttesttesttesttesttesttest</tduid>
<timestamp>2016-03-05 15:23:00 GMT</timestamp>
<extraReportingInfo>
<isUniveralStoreNewPurchaser>True</isUniveralStoreNewPurchaser>
<onEntry>
<productType>Test product</productType>
<tuner>TEST TUNER</tuner>
<userOs>Linux[2.0.10340.184]</userOs>
<userDevice>Linux.Ubuntu</userDevice>
</onEntry>
<onPurchase>
<productType>Test Product</productType>
<tuner>TEST TUNER</tuner>
<contentOwnership>TST</contentOwnership>
<userDevice>Unknown</userDevice>
</onPurchase>
</extraReportingInfo>
<reportInfo>
<item>
<productNumber>PRODUCT_NUMBER</productNumber>
<productName>PRODUCT_NAME</productName>
<price>0</price>
<quantity>1</quantity>
</item>
</reportInfo>
</transaction>
Following the code that is managing to load the xml file:
public BatchReader(String Filename) {
try {
this.filename = Filename;
File XMLFile = new File(Filename);
DocumentBuilderFactory DBFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder DBBuilder = DBFactory.newDocumentBuilder();
DBFactory.setValidating(true);
this.doc = DBBuilder.parse(XMLFile);
// http://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work
this.doc.getDocumentElement().normalize();
// Added for debug
System.out.println(XMLFile.getAbsolutePath());
// Setting the batch type
this.Type = "standard";
this.organizationId = Integer.parseInt(this.getString("organizationId", this.doc.getDocumentElement()));
this.Sequence = (this.getString("sequenceNumber", this.doc.getDocumentElement()) != null) ? Integer.parseInt(this.getString("sequenceNumber", this.doc.getDocumentElement())) : 0;
this.checksum = (this.getString("checksum", this.doc.getDocumentElement()) != null) ? this.getString("checksum", this.doc.getDocumentElement()) : null;
//this.checksum = this.getString("checksum", this.doc.getDocumentElement());
this.txAmount = this.doc.getElementsByTagName("transaction").getLength();
} catch ( NullPointerException | NumberFormatException e) {
e.printStackTrace();
} catch (ParserConfigurationException | SAXException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
The error happens on this line:
this.doc = DBBuilder.parse(XMLFile);
The file is 73 Mb.
I sincerely don't know where else looking for the problem,
could you help me please?
Edit:
In a file I think I found the error, in a line:
<productName>ПÑтница!</productName>
.
This seems to have been codified in ansi, instead of UTF-8.
I'm gonna check in the other files, if I find something similar.