1

I'm learning a bit about UTF-8 characters and how they're constructed.

I'm working on a method that will take an input string and convert any "special" characters to their basic equivalent.

For example, I expected the "c" values below to output "c":

"ç" => "c"
"ć" => "c"
"č" => "c"
"ⓒ" => "c"
"" => "c"

However, only "ⓒ" is normalized to "c".

I am using this method:

private String getNormalizedInputText() {
    //String input = getIntent().getStringExtra(Intent.EXTRA_PROCESS_TEXT);
    String input = "ç ć č ⓒ ";

    String normalizedInput =  Normalizer.normalize(input, Normalizer.Form.NFKC);

    Log.d("Normalized Input", normalizedInput);

    return normalizedInput;
}

EDIT: Can this be done mathematically?

tylersDisplayName
  • 1,603
  • 4
  • 24
  • 42
  • You may want to try [`NFKD`](https://docs.oracle.com/javase/8/docs/api/java/text/Normalizer.Form.html#NFKD) instead of [`NFKC`](https://docs.oracle.com/javase/8/docs/api/java/text/Normalizer.Form.html#NFKC) if you’re looking for a full decomposition – MTCoster Jan 24 '19 at 17:34
  • Possible duplicate of https://stackoverflow.com/questions/3322152/is-there-a-way-to-get-rid-of-accents-and-convert-a-whole-string-to-regular-lette – Vebbie Jan 24 '19 at 17:35
  • @Vebbie - I actually intended on referencing that question in mine. I've tried that solution but I found that it completely filters out characters such as "" rather than outputting "c" – tylersDisplayName Jan 25 '19 at 14:21
  • @MTCoster - I have tried both of those options and end up with the same result. – tylersDisplayName Jan 25 '19 at 14:21
  • *"Can this be done mathematically?"* - No. – Stephen C Jan 25 '19 at 14:26

0 Answers0