Remove accents in string except "ñ"

Question

I have the following example code:

var inputString = "ñaáme";
inputString = inputString.Replace('ñ', '\u00F1');
var normalizedString = inputString.Normalize(NormalizationForm.FormD);
var result = Regex.Replace(normalizedString, @"[^ñÑa-zA-Z0-9\s]*", string.Empty);
return result.Replace('\u00F1', 'ñ'); // naame :(

I need to normalize the text without removing the "ñ"s

I followed this example But it's for Java and it has not worked for me

I want your result to be: "ñaame".

Wiktor Stribiżew · Accepted Answer · 2017-11-25T17:29:26.967

8

You may match any Unicode letter other than your specific letter ñ and ASCII letters (that do not need normalization) with (?i)[\p{L}-[ña-z]]+ regex and normalize it. Then, also remove any combining marks from the string.

Use

var inputString = "ñaáme";
var result = string.Concat(Regex.Replace(inputString, @"(?i)[\p{L}-[ña-z]]+", m => 
        m.Value.Normalize(NormalizationForm.FormD)
    )
    .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
Console.Write(result);

See the C# demo

Pattern description

(?i) - ignore case modifier
[ - start of a character class
- \p{L} - any Unicode letter
- -[ - other than
  - ña-z - ñ and ASCII letters
- ] - end of the subtraction class
]+ - 1 or more occurrences.

edited Nov 25 '17 at 17:29

answered Nov 25 '17 at 17:20

Wiktor Stribiżew

607,720
39
448
563

1

Your answer is very good. I worked his code. Thank you very much. – HenryGuillen17 Nov 27 '17 at 13:02
Sure, I never would have come to the solution that way, and that I saw some Regular Expressions. Again, thank you very much. – HenryGuillen17 Nov 27 '17 at 13:04

Remove accents in string except "ñ"

1 Answers1

Linked