I have a java class that parses an xml file, and writes its content to MySQL. Everything works fine, but the problem is when the xml file contains invalid unicode characters, an exception is thrown and the program stops parsing the file.
My provider sends this xml file on a daily basis with a list of products with its price, quantity etc. and I have no control over this, so invalid characters will always be there.
All I'm trying to do is to catch these errors, ignore them and continue parsing the rest of the xml file.
I've added a try-catch statements on the startElement
, endElement
and characters
methods of the SAXHandler class, however, they don't catch any exception and the execution stops whenever the parser finds an invalid character.
It seems that I can only catch these exceptions from the function who calls the parser:
try {
myIS = new FileInputStream(xmlFilePath);
parser.parse(myIS, handler);
retValue = true;
} catch(SAXParseException err) {
System.out.println("SAXParseException " + err);
}
However, that's useless in my case, even if the exception tells me where the invalid character is, the execution stops, so the list of products is far from being complete. This list has about 8,000 products and only a couple of invalid characters, however, if the invalid character is in the first 100 products, then all the 7,900 products are not updated in the database. I've also noticed that the endDocument
method is not called if an exception occurs.
Somebody asked the same question here some years ago, but didn't get any solution.
I'd really appreciate any ideas or workarounds for this.
Data Sample (as requested):
<Producto>
<Brand>
<Description>Epson</Description>
<ManufacturerId>eps</ManufacturerId>
<BrandId>eps</BrandId>
</Brand>
<New>false</New>
<OnSale>null</OnSale>
<Type>Physical</Type>
<Description>Epson TM T88V - Impresora de recibos - línea térmica - rollo 8 cm - hasta 300 mm/segundo - paralelo, USB</Description>
<Category>
<CategoryId>pos</CategoryId>
<Description>Puntos de Venta</Description>
<Subcategories>
<CategoryId>pos.printer</CategoryId>
<Description>Impresoras para Recibos</Description>
</Subcategories>
</Category>
<InStock>0</InStock>
<Price>
<UnitPrice>4865.6042</UnitPrice>
<CurrencyId>MXN</CurrencyId>
</Price>
<Manufacturer>
<Description>Epson</Description>
<ManufacturerId>eps</ManufacturerId>
</Manufacturer>
<Mpn>C31CA85814</Mpn>
<Sku>PT910EPS27</Sku>
<CompilationDate>2020-02-25T12:30:14.6607135Z</CompilationDate>
</Producto>