0

With the goal to prevent html code injection and cross-site scripting, there is a filter built for service requests to escape some characters using: StringEscapeUtils.escapeHtml(text)

However, this is also escaping some UTF8 characters like äöü. Using an excludeList and converting these values to their hash code before calling the "StringEscapeUtils.escapeHtml" and converting back from hash values to strings after this call, solves the problem. But this is not a very elegant solution!

    String[] excludeList = {"ü", "Ü", "ö", "Ö", "ä", "Ä", "ß"};

    private static String escapeHtml(String text, String[] exclusionList) {
    TreeMap<Integer, String> excludeTempMap = new TreeMap<Integer, String>();

    //replace characters from exclusionList in the text with their equivalent hashCode
    for(String excludePart : exclusionList) {
        Matcher matcher = Pattern.compile(excludePart, Pattern.MULTILINE).matcher(text);

        while(matcher.find()) {
            String match = matcher.group();
            Integer matchHash = match.hashCode();

            text = matcher.replaceFirst(String.valueOf(matchHash));

            excludeTempMap.put(matchHash, match);

            matcher.reset(text);
        }
    }

    //escape malicious html characters
    text = StringEscapeUtils.escapeHtml(text);

    //replace back characters from exclusionList from hash values to string
    for(Map.Entry<Integer, String> excludeEntry : excludeTempMap.entrySet()) {
        text = text.replaceAll(
            String.valueOf(excludeEntry.getKey()),
            excludeEntry.getValue()
        );
    }

    return text;
}

Does someone have a tip how to achieve this with a better solution? Is their a better library which can be used to whitelist some language specific characters?

orcl user
  • 167
  • 1
  • 4
  • 9
  • Possible duplicate of https://stackoverflow.com/questions/59280607/stringescapeutils-not-handling-utf-8 – AlgorithmFromHell Jul 28 '21 at 18:15
  • 1
    Every web page template language has a way of doing this. Are you providing your HTML content from a subclass of HttpServlet? Or are you using a template language like JSP, JSF, Thymeleaf, Freemarker, etc.? – VGR Jul 28 '21 at 18:19
  • What is `StringEscapeUtils`? – Andreas Jul 28 '21 at 19:05
  • It is a java application and the validation takes place in a class which extends "HttpServletRequestWrapper" – orcl user Jul 29 '21 at 14:24
  • You haven't provided any sample data for testing, but I think you might be able to achieve what you want simply by calling `StringEscapeUtils.escapeJava()` to escape your text, and then calling `UnicodeUnescaper.translate()` to unescape characters such as "ü" and "ß". While it's not a very elegant approach, you would only need to write 2 lines of code to replace what you have in the OP! See [this answer for sample code](https://stackoverflow.com/a/59332501/2985643). If that doesn't help, please update your question with some specific sample data, along with the desired result after escaping. – skomisa Dec 16 '22 at 07:56

0 Answers0