4

I am developing an Android application with Android Studio and I have a class with a static property like this:

public static final Pattern DIACRITICS_AND_FRIENDS = Pattern.compile("[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+");

Everything is working fine when I debug the app on my physical device - API32.

But when I try to debug on an emulator API28, I get an exception whenever I try to reference any static property of the 'static' class.

FATAL EXCEPTION: main Process: com.example.myapp, PID: 10635
java.lang.ExceptionInInitializerError 
at com.example.myapp.MainActivity.onCreate(MainActivity.java:40)
at android.app.Activity.performCreate(Activity.java:7136) 
... 
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858) 
Caused by: java.util.regex.PatternSyntaxException: U_ILLEGAL_ARGUMENT_ERROR
[\p{InCombiningDiacriticalMarks}\p{IsLm}\p{IsSk}]+
at java.util.regex.Pattern.compileImpl(Native Method)
at java.util.regex.Pattern.compile(Pattern.java:1344)
at java.util.regex.Pattern.<init>(Pattern.java:1328)
at java.util.regex.Pattern.compile(Pattern.java:950)
at com.example.myapp.MyStaticClass.<clinit>(MyStaticClass.java:21)
... 16 more

How can I fix this?

EDIT: with an emulator API30 it does not throw an exception, but this line is not working as expected anymore (it is not removing diactricts, but everything):

str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll("")
EstevaoLuis
  • 2,422
  • 7
  • 33
  • 40

1 Answers1

1

I suspect the syntax you are using for your regex pattern is not supported in all versions of Android. The \p{} Unicode property escapes are a part of Java's regex engine, but they were not fully implemented in Android until API level 29.

That means that on any API level lower than 29, the pattern "[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+" will throw a PatternSyntaxException, and on API level 29 or 30, the pattern is interpreted incorrectly.

To solve your problem, you can use the Normalizer class in Java, which can remove diacritical marks from a string, as described in "Remove Accents and Diacritics From a String in Java" by Eugen Baeldung.
However, the behavior of Normalizer also varies depending on the Android API level. The following solution should work for all Android versions:

public static String removeDiacritics(String str) {
    if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.KITKAT) {
        str = Normalizer.normalize(str, Normalizer.Form.NFKD);
        str = str.replaceAll("\\p{M}", "");
    } else {
        final Pattern DIACRITICS = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
        str = Normalizer.normalize(str, Normalizer.Form.NFD);
        str = DIACRITICS.matcher(str).replaceAll("");
    }
    return str;
}

Here, for API 19 (KITKAT) and above, Normalizer.Form.NFKD is used to decompose the string into separate characters and combining marks, and then all Unicode marks (\\p{M}) are removed.
For versions below API 19, only the Combining Diacritical Marks Unicode block is removed (\\p{InCombiningDiacriticalMarks}), as this is the only Unicode property escape that is supported.

That will not remove characters categorized as "Letter, modifier" (\\p{IsLm}) or "Symbol, modifier" (\\p{IsSk}) (see for illustration "Remove diacritical marks from Unicode chars"), but it is unclear whether you wanted these removed in the first place, as your original code would have left these in on Android API levels below 29.

Keep also in mind this depends on the language: for languages like German, you might want to convert 'ü' to 'u', but it is important to be aware that for other languages this might be wrong. Technically, the code above is about removal of diacritics, but linguistically... this is a complex issue.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thank you very much! This is exactly what I was trying to do. I tested on API 28, 30 and 32 and is working perfectly! – EstevaoLuis Jul 31 '23 at 17:20