I have an old Java application that processes XML from a third-party data feed.
The data feed allows user-input, and it is now suddenly containing emojis such as ��
(). I'm actually surprised it took this long for this problem to appear (emojis have been around for a few years now).
The app blows up in javax.xml.parsers.DocumentBuilder.parse(InputStream)
:
org.xml.sax.SAXParseException; lineNumber: 105; columnNumber: 3039; Character reference "&#
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
Is there a quick, localized fix that I can apply without having to redesign and rearchitect the whole application? Also, would prefer to avoid a regex search/replace hack since that can introduce other subtle problems.