4

I'm using asp.net 4 and c#.

I have a string that can contains:

  • Special Characters, like: !"£$%&/()/#
  • Accented letters, like: àòèù
  • Empty spaces, like: " "(1 consecutive or more),

Example string:

#Hi this          is  rèally/ special strìng!!!

I would like to:

a) Remove all Special Characters, like:

Hi this          is  rèally special strìng

b) Convert all Accented letters to NON Accented letters, like:

Hi this          is  really special string

c) Remove all Empty spaces and replace theme with a dash (-), like:

Hi-this-is-really-special-string

My aim is to creating a string suitable for URL path for better SEO.

Any idea how to do it with Regular Expression or another techniques?

Thanks for your help on this!

Conspicuous Compiler
  • 6,403
  • 1
  • 40
  • 52
GibboK
  • 71,848
  • 143
  • 435
  • 658
  • possible duplicate of [Ignoring accented letters in string comparison](http://stackoverflow.com/questions/359827/ignoring-accented-letters-in-string-comparison) – GvS Aug 09 '11 at 06:51

3 Answers3

9

Similar to mathieu's answer, but more custom made for you requirements. This solution first strips special characters and diacritics from the input string, and then replaces whitespace with dashes:

string s = "#Hi this          is  rèally/ special strìng!!!";
string normalized = s.Normalize(NormalizationForm.FormD);


StringBuilder resultBuilder = new StringBuilder();
foreach (var character in normalized)
{
    UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(character);
    if (category == UnicodeCategory.LowercaseLetter
        || category == UnicodeCategory.UppercaseLetter
        || category == UnicodeCategory.SpaceSeparator)
        resultBuilder.Append(character);
}
string result = Regex.Replace(resultBuilder.ToString(), @"\s+", "-");

See it in action at ideone.com.

Jens
  • 25,229
  • 9
  • 75
  • 117
  • Thanks Jens for your code. Just a last question if my string would be encoded like this: %23Hi%20this%20%20%20%20%20%20%20%20%20%20is%20%20r%C3%A8ally/%20special%20str%C3%ACng!!! How can i decoded before using in your code? Thanks for your help on this! – GibboK Aug 09 '11 at 07:27
  • You can use HttpUtility.UrlDecode from the System.Web assembly. – Jens Aug 09 '11 at 07:49
  • Great answer. It's a shame that URLEncode() still doesn't offer the basic level functionality that its method name would imply. – Eric Jan 04 '14 at 20:32
3

You should have a look a this answer : Ignoring accented letters in string comparison

Code here :

static string RemoveDiacritics(string sIn)
{
  string sFormD = sIn.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  foreach (char ch in sFormD)
  {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
    if (uc != UnicodeCategory.NonSpacingMark)
    {
      sb.Append(ch);
    }
  }

  return (sb.ToString().Normalize(NormalizationForm.FormC));
}
Community
  • 1
  • 1
mathieu
  • 30,974
  • 4
  • 64
  • 90
0

I am not an expert when it comes to RegularExpressions but I doubt it would be useful for this sort of computation.

To me, a simple iteration over the characters of the input is enough:

List<char> specialChars = 
    new List<char>() { '!', '"', '£', '$', '%', '&', '/', '(', ')', '/', '#' };

string specialString = "#Hi this          is  rèally/ special strìng!!!";

System.Text.StringBuilder builder =
    new System.Text.StringBuilder(specialString.Length);

bool encounteredWhiteSpace = false;


foreach (char ch in specialString)
{
    char val = ch;

    if (specialChars.Contains(val))
        continue;

    switch (val)
    {
        case 'è':
            val = 'e'; break;
        case 'à':
            val = 'a'; break;
        case 'ò':
            val = 'o'; break;
        case 'ù':
        case 'ü':
            val = 'u'; break;
        case 'ı':
        case 'ì':
            val = 'i'; break;
    }

    if (val == ' ' || val == '\t')
    {
        encounteredWhiteSpace = true;
        continue;
    }

    if (encounteredWhiteSpace)
    {
        builder.Append('-');
        encounteredWhiteSpace = false;
    }

    builder.Append(val);
}

string result = builder.ToString();
tafa
  • 7,146
  • 3
  • 36
  • 40