9

For Example -

text = Československá obchodní banka;

text string contains diacritics like Č , á etc.

I want to write a function where i will pass this string "Československá obchodní banka" and function will return true if string contains diacritics else false.

I have to handle diacritics and string which contains character which doesn't fall in A-z or a-z range separately.

1) If String contains diacritics then I have to do some XXXXXX on it.

2) If String contains character other than A-Z or a-z and not contains diacritics  then do some other operations YYYYY.

I have no idea how to do it.

Pramod Kumar
  • 7,914
  • 5
  • 28
  • 37
  • What's the use case? By "diacritics" do you really mean that you want to look for letters that contain diacritics, or do you mean *any letter* that is not in the range A-Z? What about non-latin letters like 'じ' which you may argue contain the Japanese equivalent of diacritics? – deceze Jul 03 '12 at 10:52
  • why not check each character in the string and parse it to an int, anything over 127 would be a diacritic – David Kroukamp Jul 03 '12 at 10:53
  • 1
    @David That's a little too simplistic and exactly why I was asking what I was asking above. I wasn't aware that "µ" contains diacritics. – deceze Jul 03 '12 at 10:55
  • @deceze Lol, yes true, completely left me at the time of writing :) – David Kroukamp Jul 03 '12 at 10:57
  • I have to handle diacritics and string which contains character which doesn't fall in A-z or a-z range separately. – Pramod Kumar Jul 03 '12 at 10:59

2 Answers2

13

One piece of knowledge: in Unicode there exists a code for á but the same result one may get with an a and a combining mark-'.

You can use java.text.Normalizer, as follows:

public static boolean hasDiacritics(String s) {
    // Decompose any á into a and combining-'.
    String s2 = Normalizer.normalize(s, Normalizer.Form.NFD);
    return s2.matches("(?s).*\\p{InCombiningDiacriticalMarks}.*");
    //return !s2.equals(s);
}
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • I corrected my answer: if the original s already contained a decomposed a` an equals test would not succeed. – Joop Eggen Jul 03 '12 at 11:33
5

The Normalizer class seems to be able to accomplish this. Some limited testing indicate that

Normalizer.isNormalized(text, Normalizer.Form.NFD)

might be what you need.

Keppil
  • 45,603
  • 8
  • 97
  • 119