8

Possible Duplicate:
ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n or Remove diacritical marks from unicode chars

How to remove diacritics from strings?

For example transform all á->a, č->c, etc. that would work for all languages.

I'm doing full-text search, and would need to ignore any diacritics on searched text.

Thanks

Community
  • 1
  • 1
Pointer Null
  • 39,597
  • 13
  • 90
  • 111

1 Answers1

20

Using API level 9+ you can use the Normalizer class, e.g.

String normalized = Normalizer.normalize("âbĉdêéè", Form.NFD)
    .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");

(Keysers linked answer looks better, it cleans more crap)

This would return "abcdeee".

Community
  • 1
  • 1
Jens
  • 16,853
  • 4
  • 55
  • 52
  • 1
    Thanks, that's it! Shame about API 9+, but I can live with it. – Pointer Null May 22 '12 at 11:43
  • Shouldn't `Normalizer.normalize` already remove these special signs? What does it do? It seems not to do anything... – android developer Feb 03 '23 at 13:26
  • This sadly doesn't work for Hebrew though, for the case of "Niqqud" (equivalent of vowels in English, yet they are optional and rarely used for most people). Example is converting the word "בְּרֵאשִׁית" which should become "בראשית" . https://en.wikipedia.org/wiki/Diacritic#Hebrew – android developer Feb 04 '23 at 09:21