I maintain a small java servlet-based webapp that presents forms for input, and writes the contents of those forms to MariaDB.
The app runs on a Linux box, although the users visit the webapp from Windows.
Some users paste text into these forms that was copied from MSWord docs, and when that happens, they get internal exceptions like the following:
Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Incorrect string value: '\xC2\x96 for...' for column 'ssimpact' at row 1
For instance, I tested it with text like the following:
Project – for
Where the dash is a "long dash" from the MSWord document.
I don't think it's possible to convert the wayward characters in this text to the "correct" characters, so I'm trying to figure out how to produce a reasonable error message that shows a substring of the bad text in question, along with the index of the first bad character.
I noticed postings like this: How to determine if a String contains invalid encoded characters .
I thought this would get me close, but it's not quite working.
I'm trying to use the following method:
private int findUnmappableCharIndex(String entireString) {
int charIndex;
for (charIndex = 0; charIndex < entireString.length(); ++ charIndex) {
String currentChar = entireString.substring(charIndex, charIndex + 1);
CharBuffer out = CharBuffer.wrap(new char[currentChar.length()]);
CharsetDecoder decoder = Charset.forName("utf-8").newDecoder();
CoderResult result = decoder.decode(ByteBuffer.wrap(currentChar.getBytes()), out, true);
if (result.isError() || result.isOverflow() || result.isUnderflow() || result.isMalformed() || result.isUnmappable()) {
break;
}
CoderResult flushResult = decoder.flush(out);
if (flushResult.isOverflow()) {
break;
}
}
if (charIndex == entireString.length() + 1) {
charIndex = -1;
}
return charIndex;
}
This doesn't work. I get "underflow" on the first character, which is a valid character. I'm sure I don't fully understand the decoder mechanism.