1

I'm trying to escape japanese xml to display as normal japanese string and not unicode xml. I can't use apache.commons.lang3 and only apache.commons.lang is preferred. If you have any other suggestions not in this library feel free to share. Thanks in advance!

final String xmlToEscape = "言語が良くない"; final String escapedXml = StringEscapeUtils.escapeXml(xmlToEscape);

Prints:

&#35328 ;&#35486 ;&#12364 ;&#33391 ;&#12367 ;&#12394 ;&#12356 ;

Should print:

言語が良くない

Thelouras
  • 852
  • 1
  • 10
  • 30
  • 1
    This might be helpful, [check out this solution](https://stackoverflow.com/questions/8984875/stringescapeutils-escapexml-is-converting-utf8-characters-which-it-should-not) – Omoro Jan 24 '19 at 11:49

1 Answers1

0

StringEscapeUtils.escapeXml() in apache.commons.lang always escapes non-ASCII characters.

If you don't want to escape Japanese characters, you have to pass only the ASCII characters in the string to StringEscapeUtils.escapeXml() like this:

package org.example;

import java.util.Arrays;

import org.apache.commons.lang.StringEscapeUtils;

public class Test {
    public static void main(String[] args) {
        // You will get "言語が良くない <ABC>"
        System.out.println(StringEscapeUtils.escapeXml("言語が良くない <ABC>"));
        // You will get "言語が良くない &lt;ABC&gt;"
        System.out.println(escapeXml("言語が良くない <ABC>"));
    }

    public static String escapeXml(String str) {
        return Arrays.stream(str.split("")).map(s -> escapeCharacter(s)).collect(StringBuilder::new, StringBuilder::append, StringBuilder::append).toString();
    }

    public static String escapeCharacter(String str) {
        if (str.matches("\\p{ASCII}")) {
            return StringEscapeUtils.escapeXml(str);
        } else {
            return str;
        }
    }
}
SATO Yusuke
  • 1,600
  • 15
  • 39