3

Possible Duplicate:
ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n or Remove diacritical marks from unicode chars
How to replace special characters in a string?

I would like to format some String such as "I>Télé" to something like "itele". The idea is that I want my String to be lower case (done), without whitespaces (done), no accents or special characters (like >, <, /, %, ~, é, @, ï etc).

It is okay to delete occurences of special characters, but I want to keep letters while removing accents (as I did in my example). Here is what I did, but I don't think that the good solution is to replace every é,è,ê,ë by "e", than do it again for "i","a" etc, and then remove every special character...

String name ="I>télé" //example
String result = name.toLowerCase().replace(" ", "").replace("é","e").........;

The purpose of that is to provide a valid filename for resources for an Android app, so if you have any other idea, I'll take it !

Community
  • 1
  • 1
Thibault
  • 568
  • 3
  • 10
  • 21

3 Answers3

17

You can use the java.text.Normalizer class to convert your text into normal Latin characters followed by diacritic marks (accents), where possible. So for example, the single-character string "é" would become the two character string ['e', {COMBINING ACUTE ACCENT}].

After you've done this, your String would be a combination of unaccented characters, accent modifiers, and the other special characters you've mentioned. At this point you could filter the characters in your string using only a whitelist to keep what you want (which could be as simple as [A-Za-z0-9] for a regex, depending on what you're after).

An approach might look like:

String name ="I>télé"; //example
String normalized = Normalizer.normalize(name, Form.NFD);
String result = normalized.replaceAll("[^A-Za-z0-9]", "");
Andrzej Doyle
  • 102,507
  • 33
  • 189
  • 228
  • 2
    Thanks to the links provided, I found [this](http://stackoverflow.com/a/4122207/1520739). And The solution would be a combination of this and replaceAll("[^A-Za-z0-9]", ""). Thanks a lot! – Thibault Jul 18 '12 at 08:37
  • +1 for thinking at Normalizer – cl-r Jul 18 '12 at 08:48
  • Thank you very much. This saved me a lot of time and ugly regex! – Jeff.H May 23 '18 at 15:43
1

You can do something like

String res = ""
for (char c : name.toCharArray()) {
    if (Character.isLetter(c) ||Character.isDigit(c))
        res += c    
}

//Normalize using the method below

http://blog.smartkey.co.uk/2009/10/how-to-strip-accents-from-strings-using-java-6/

public static String stripAccents(String s) {    
    s = Normalizer.normalize(s, Normalizer.Form.NFD);   
    s = s.replaceAll("\\p{InCombiningDiacriticalMarks}+", ""); 
    return s;
}
Viktor Mellgren
  • 4,318
  • 3
  • 42
  • 75
0

try using ascii code. may this link will help

Alfa
  • 150
  • 1
  • 2
  • 10