0

I am looking at converting the JS slugify function by diegok and he is using this JavaScript construct:

function turkish_map() {
    return {
        'ş':'s', 'Ş':'S', 'ı':'i', 'İ':'I', 'ç':'c', 'Ç':'C', 'ü':'u', 'Ü':'U',
        'ö':'o', 'Ö':'O', 'ğ':'g', 'Ğ':'G'
    };
}

It is a map of char to char translations. However, I don't know which JS construct is this and how could it be rewritten in C# preferably without spending too much time on rewriting? (There's more to it, this is just one of the functions).

Should I make an array, dictionary, something else?

mare
  • 13,033
  • 24
  • 102
  • 191
  • The reason I am converting is because I need a server side C# method that works the same as the JS one (same slugs generated from the same input). – mare Jun 10 '11 at 11:51
  • 1
    Look at this http://stackoverflow.com/questions/3769457/how-can-i-remove-accents-on-a-string/3769995#3769995 – BrunoLM Jun 10 '11 at 11:56

2 Answers2

4
Dictionary<char, char> turkish_map() {
    return new Dictionary<char, char> {
        {'ş','s'}, {'Ş','S'}, {'ı','i'}, {'İ','I'} {'ç','c'} , {'Ç','C' }, {'ü','u'}, {'Ü','U'}, {'ö','o'}, {'Ö','O'}, {'ğ','g'}, {'Ğ','G'} }; 
}

The use it like:

turkish_map()['İ'] // returns I

Or you can save it into field and use it without creating it every time.

Euphoric
  • 12,645
  • 1
  • 30
  • 44
  • That should probably be `return new Dictionary() { ...`? And "you can save it into field" should not be a *suggestion* if you opted for a dictionary, that is the only reasonable way to use it. – vgru Jun 10 '11 at 11:59
  • You don't need the parentheses on a collection initialiser. – Ciaran Jun 10 '11 at 13:25
  • I realize this answer was most directly related to your question, but BrunoLM's answer is actually the most appropriate way to do this. Anytime you resort to culture-specific dictionaries for something like this, you're setting yourself up for a maintenance nightmare. – Chris Jun 10 '11 at 14:46
  • @Ciaran: when I wrote my comment there was no `new` keyword either. But I am more concerned about instantiating a new Dictionary on each access to a method, because even traversing a plain array would yield better performance than this. I know that this may only serve as an example, but I've never seen a dictionary being used this way. – vgru Jun 10 '11 at 19:22
2

Use these methods to remove diacritics, the result will be sSıIcCuUoOgG.

namespace Test
{
    public class Program
    {

        public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm, Func<char, char> customFolding)
        {
            foreach (char c in src.Normalize(compatNorm ? NormalizationForm.FormKD : NormalizationForm.FormD))
                switch (CharUnicodeInfo.GetUnicodeCategory(c))
                {
                    case UnicodeCategory.NonSpacingMark:
                    case UnicodeCategory.SpacingCombiningMark:
                    case UnicodeCategory.EnclosingMark:
                        //do nothing
                        break;
                    default:
                        yield return customFolding(c);
                        break;
                }
        }
        public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm)
        {
            return RemoveDiacritics(src, compatNorm, c => c);
        }
        public static string RemoveDiacritics(string src, bool compatNorm, Func<char, char> customFolding)
        {
            StringBuilder sb = new StringBuilder();
            foreach (char c in RemoveDiacriticsEnum(src, compatNorm, customFolding))
                sb.Append(c);
            return sb.ToString();
        }
        public static string RemoveDiacritics(string src, bool compatNorm)
        {
            return RemoveDiacritics(src, compatNorm, c => c);
        }


        static void Main(string[] args)
        {
            var str = "şŞıİçÇüÜöÖğĞ";

            Console.Write(RemoveDiacritics(str, false));

            // output: sSıIcCuUoOgG

            Console.ReadKey();
        }
    }
}

For other characters like ı which wasn't converted, and others as you mentioned as @, you can use the method to remove diacritics then use a regex to remove invalid characters. If you care enough for some characters you can make a Dictionary<char, char> and use it to replace them each one of them.

Then you can do this:

var input = "Şöme-p@ttern"; // text to convert into a slug
var replaces = new Dictionary<char, char> { { '@', 'a' } }; // list of chars you care
var pattern = @"[^A-Z0-9_-]+"; // regex to remove invalid characters

var result = new StringBuilder(RemoveDiacritics(input, false)); // convert Ş to S
                                                                // and so on

foreach (var item in replaces)
{
    result = result.Replace(item.Key, item.Value); // replace @ with a and so on
}

// remove invalid characters which weren't converted
var slug = Regex.Replace(result.ToString(), pattern, String.Empty,
    RegexOptions.IgnoreCase);

// output: Some-pattern
Community
  • 1
  • 1
BrunoLM
  • 97,872
  • 84
  • 296
  • 452
  • Why the downvote? I honestly think this is the best approach. The other way is to create tons of dictionaries to convert the characters... – BrunoLM Jun 10 '11 at 12:49
  • i didn't downvote it..I'm going to upvote it. It is a nice solution though I need the dictionaries because this solution does not handle all of the characters, for instance symbols like @, TM, ... – mare Jun 10 '11 at 14:27
  • @mare thanks. I've updated my answer with more information on how to proceed with those special characters. Hope you find it useful. – BrunoLM Jun 10 '11 at 14:42