2

I cannot compile this:

String[][] UMLAUT_REPLACEMENTS = {{"\u0022", """},{"\u0021", "!"}};

I tried to escape the special character by using \\ but no effect.

This is the error code:

Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project opk-application-util: Compilation failure: Compilation failure: 
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/util/SonderZeichenFilter.java:[50,41] '}' expected
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,45] ';' expected
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,46] illegal character: '#'
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,47] ';' expected
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/opk/util/SonderZeichenFilter.java:[50,50] unclosed string literal
Marcono1234
  • 5,856
  • 1
  • 25
  • 43
Franc90
  • 21
  • 4
  • 3
    I guess there's no necessity to escape ampersand character – edwgiz Sep 04 '20 at 10:07
  • Yes - this was an editing mistake here. It fails this way: `String[][] UMLAUT_REPLACEMENTS = {{"\u0022", """},{"\u0021", "!"}};` – Franc90 Sep 04 '20 at 10:59

3 Answers3

2

In Java Unicode escape sequences (\uXXXX) are handled as part of pre-processing and before String literal escape sequences are processed. Therefore when the compiler processes "\u0022" it is actually processing the String literal """ which is one empty String literal (two double quotes) followed by the opening quote of another String literal therefore resulting in the error "unclosed string literal" because there is an uneven amount of double quotes in the code.

This is a somewhat common cause for malformed Javadoc (when the author wants to write literally \uXXXX but the resulting HTML instead contains the respective Unicode character) and most IDEs are confused by this as well (e.g. \u0063lass MyClass {} is valid Java source code; \u0063 = c).

In your case you can use the special escape sequence \" to write a literal ". This will also improve readability because not everyone is familiar with the Unicode code point of ". Similarly \u0021 could be written as ! since that character has no special meaning inside a Java String. Your code could therefore be written like this:

String[][] UMLAUT_REPLACEMENTS = {{"\"", """},{"!", "!"}};

If you want the literal \uXXXX inside a Java String you will have to escape the backslash by preceeding it with another \: "\\uXXXX"

Marcono1234
  • 5,856
  • 1
  • 25
  • 43
  • Hi! Sorry I placed the backslash for editing reasons here- I actually use exactly this code sequence and I get this error above: `` String[][] UMLAUT_REPLACEMENTS = {{"\u0022", """},{"\u0021", "!"}};`` – Franc90 Sep 04 '20 at 10:24
  • @FrancescoRovetto, thanks for the clarification. I had misunderstood your problem and have now updated my answer accordingly. – Marcono1234 Sep 04 '20 at 11:19
  • Thank you so much!! \\uXXXX would remove the error, but Java doesn't recognizes it anymore as unicode somehow. But we are close to the solution :-) Unfortunatelly, we cannot use the blank signs like ! ", etc. I'm looking for a way to be able to use unicodes in a String array... – Franc90 Sep 04 '20 at 13:38
  • @FrancescoRovetto could you please describe the desired outcome a little bit more in detail then? Note that e.g. `"\u0021"` and `"!"` are in the compiled class and at runtime exactly the same, so if you only want to create a String containing `"!"` then there is no need to use a unicode escape. If you want `\u0021` to be the literal value of the String, then you need to escape it. Just try it out, e.g. use `System.out.println("\\u0021");` – Marcono1234 Sep 04 '20 at 14:14
  • Yes actually there are only problems with the characters used by Java. And then, for example \u0022 throws an error. But I found now a solution which I will present in the answer. :-) – Franc90 Sep 04 '20 at 15:20
0

Seemingly the issue is "\u0022" string, because java compiler converts the escaping sequence to UTF before a code parsing that sometimes leads to the errors.

https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10.6

Compile time error while adding unicode \u0022

So, "\u0022" must be replaced with "\""

edwgiz
  • 747
  • 5
  • 15
0

I found the solution!

So, the reason, why String[][] UMLAUT_REPLACEMENTS = {{"\u0022", """},{"\u0021", "!"}}; did not work, is, because \u0022 is already interpreted as " while compiling, which throws an error, because """ needs to be escaped.

But if you escape \u0022, it will not be recognized as character anymore.

Yet there is also a solution, which I applied.


By the way, this solution is to mask all special characters of the latin ascii letters except the very simple ones.

First, you declare a String array:

    public String escapeHtml(String input) {

    String escapedHtml = input;

String[][] UMLAUT_REPLACEMENTS =
            {
                    {"\\u0021", "&33"},
                    {"\\u0022", "&#34"},
                    {"\\u0024", "&#36"},
                    {"\\u0025", "&#37"},
                    {"\\u0026", "&#38"},
                    {"\\u0027", "&#39"},
                    {"\\u0028", "&#40"},
};

Then, you Look for the characters to replace them with the HTML Entities but use StringEscapeUtils.unescapeJava(INPUT) to unescape \uXXXX

    for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
        String unescapedSign = StringEscapeUtils.unescapeJava(UMLAUT_REPLACEMENTS[i][0]);
        escapedHtml = escapedHtml.replace(unescapedSign, UMLAUT_REPLACEMENTS[i][1]);
    }


    return escapedHtml;


Thank you for your help!!
Franc90
  • 21
  • 4
  • Have you tested whether the other answers are not working for you? Because what you are doing is `"\\u0021"` -StringEscapeUtils.unescapeJava(...)-> `"!"` -> _replace_. This seems to complicate things unnecessarily when you can just write `"!"` in the first place, omitting `unescapeJava` (unless there is other code using `UMLAUT_REPLACEMENTS` which is not shown here). – Marcono1234 Sep 04 '20 at 15:52
  • Yeah it would work as long as the source is deployed only local. unicode is just more save - if somehow something changes with the encoding, all the characters are gone. – Franc90 Sep 07 '20 at 07:35
  • 1
    Both `!` and `"` are ASCII characters so I doubt that there is any (commonly used) encoding which could mess them up. Note that in Java Strings are always UTF-16, so unless you are talking about encoding issues affecting the complete source code (and not only the String literals), there should not be any issues. – Marcono1234 Sep 07 '20 at 10:22