I'm learning a bit about UTF-8 characters and how they're constructed.
I'm working on a method that will take an input string and convert any "special" characters to their basic equivalent.
For example, I expected the "c" values below to output "c":
"ç" => "c"
"ć" => "c"
"č" => "c"
"ⓒ" => "c"
"" => "c"
However, only "ⓒ" is normalized to "c".
I am using this method:
private String getNormalizedInputText() {
//String input = getIntent().getStringExtra(Intent.EXTRA_PROCESS_TEXT);
String input = "ç ć č ⓒ ";
String normalizedInput = Normalizer.normalize(input, Normalizer.Form.NFKC);
Log.d("Normalized Input", normalizedInput);
return normalizedInput;
}
EDIT: Can this be done mathematically?