Function being tested:
public static String removeNonprintableCharacters(String input) {
StringBuilder newString = new StringBuilder(input.length());
for (int offset = 0; offset < input.length();) {
int codePoint = input.codePointAt(offset);
offset += Character.charCount(codePoint);
// Replace invisible control characters and unused code points
switch (Character.getType(codePoint)) {
case Character.CONTROL: // \p{Cc}
case Character.FORMAT: // \p{Cf}
case Character.PRIVATE_USE: // \p{Co}
case Character.SURROGATE: // \p{Cs}
case Character.UNASSIGNED: // \p{Cn}
newString.append("\ufffd");
break;
default:
newString.append(Character.toChars(codePoint));
break;
}
}
return newString.toString();
}
Test method:
@Test
public void testRemoveNonprintableCharacters() throws UnsupportedEncodingException {
assertEquals("\ufffd", r(new byte[]{0}));
// jdk7:
//assertEquals("\ufffd", r(new byte[]{-7, 'a'}));
// jdk8: (???)
assertEquals("\ufffda", r(new byte[]{-7, 'a'}));
}
private String r(byte[] bytes) throws UnsupportedEncodingException {
return Unicode.removeNonprintableCharacters(new String(bytes, "UTF-8"));
}
As you can see in the test method, the returned result is different after upgrading the JVM to Java 8... why?