How to replace Characters like "á" with corresponding english alphabet

Question

I have a sample String like á, é, í, ó, ú, ü, ñ and I want to replace the special characters, for example :
á with a
é with e
and so on..

I have a map where I have special character as key and its corresponding replacement as value.
Now suppose I'll pass a String "novás músíc" into method where a regex will validate it and if any special char is found (the one which I mentioned) then it should be replaced with the mapped char.

Please help me with regex validation part.

You understand that these are **not** "special" characters, right? And that `novás` is *misspelled* if you change it to `novas` instead? It's 2015, it's completely unnecessary and inappropriate in today's world to force languages to conform to the English alphabet. — T.J. Crowder, Feb 15 '15 at 09:04
A regex is not the right tool to replace a set of characters one by one in a string. It is more efficient and less complex to iterate over the characters and replace the one character if needed. — vanje, Feb 15 '15 at 09:06
@T.J.Crowder there are valid use cases for this, for example I've used it when implementing a search tool - the strings I show to users are always the original ones, but internally I normalise both the documents and the queries so a user whose keyboard doesn't do accents can perform a search without accents and find documents with and vice versa. — Ian Roberts, Feb 15 '15 at 09:43
@IanRoberts: Absolutely, a small number of very limited use cases. But this pervasive belief that these characters are in some way "special" is best refuted barring such a case being cited. — T.J. Crowder, Feb 15 '15 at 09:48
In Danish, one would (when forced) replace "å" with "aa". Search libraries could match å to aa and aa to å with a higher weight than å to a and a to a. — Tom Blodget, Feb 15 '15 at 15:27

score 3 · Accepted Answer · edited May 23 '17 at 10:27

3

You can do this via Unicode normalization, followed by a regular expression to remove the ligature marks.

See this question and its accepted answer: "Convert Unicode to ASCII without changing the string length (in Java)"

edited May 23 '17 at 10:27

Community

1
1

answered Feb 15 '15 at 09:33

Jherico

28,584
8
61
87

score -1 · Answer 2 · answered Feb 15 '15 at 09:21

You can use this regex [^0x00-0x7F]

String source=args[0];
Pattern p = Pattern.compile("[^0x00-0x7F]");
Matcher m = p.matcher(source);

if(map.containsKey(m.group()){
//Replace with the value here
}
else{
//put a default value for all
}

This is just based on the little information provided in your question. You would need to elaborate more to get a more detailed answer. This regex would only check for ASCII values(list here)

How to replace Characters like "á" with corresponding english alphabet

2 Answers2