With the goal to prevent html code injection and cross-site scripting, there is a filter built for service requests to escape some characters using: StringEscapeUtils.escapeHtml(text)
However, this is also escaping some UTF8 characters like äöü. Using an excludeList and converting these values to their hash code before calling the "StringEscapeUtils.escapeHtml" and converting back from hash values to strings after this call, solves the problem. But this is not a very elegant solution!
String[] excludeList = {"ü", "Ü", "ö", "Ö", "ä", "Ä", "ß"};
private static String escapeHtml(String text, String[] exclusionList) {
TreeMap<Integer, String> excludeTempMap = new TreeMap<Integer, String>();
//replace characters from exclusionList in the text with their equivalent hashCode
for(String excludePart : exclusionList) {
Matcher matcher = Pattern.compile(excludePart, Pattern.MULTILINE).matcher(text);
while(matcher.find()) {
String match = matcher.group();
Integer matchHash = match.hashCode();
text = matcher.replaceFirst(String.valueOf(matchHash));
excludeTempMap.put(matchHash, match);
matcher.reset(text);
}
}
//escape malicious html characters
text = StringEscapeUtils.escapeHtml(text);
//replace back characters from exclusionList from hash values to string
for(Map.Entry<Integer, String> excludeEntry : excludeTempMap.entrySet()) {
text = text.replaceAll(
String.valueOf(excludeEntry.getKey()),
excludeEntry.getValue()
);
}
return text;
}
Does someone have a tip how to achieve this with a better solution? Is their a better library which can be used to whitelist some language specific characters?