10

We have ASP.NET application which runs different clients around the world. In this application we have dictionary for each language. In dictionary we have words in lowercase and sometimes we uppercase it in code for typographic reasons.

var greek= new CultureInfo("el-GR");
string grrr = "Πόλη";
string GRRR = grrr.ToUpper(greek); // "ΠΌΛΗ"

The problem is:

...if you're using capital letters then they must appear like this: f.e. ΠΟΛΗ and not like ΠΌΛΗ, same for all other words written in capital letters

So is it possible generically to uppercase Greek words correctly in .NET? Or should I wrote my own custom algorithm for Greek uppercase?

How do they solve this problem in Greece?

casperOne
  • 73,706
  • 19
  • 184
  • 253
Jakub Šturc
  • 35,201
  • 25
  • 90
  • 110
  • Where are you quoting the rule from? – AakashM Jan 07 '10 at 13:22
  • @AakashM: from the communication with the client. – Jakub Šturc Jan 07 '10 at 13:24
  • Have you verified that their requirement is actually true for Greek in general, or just true for them? – Jon Seigel Jan 07 '10 at 13:26
  • @Jon Seigel: I believe that this requirement is reasonable. I've been in Greece for holidays and I didn't seen uppercase letters with diacritic signs IIRC. Also I googled through some Greek sites and I also didn't find counterexample. – Jakub Šturc Jan 07 '10 at 13:30
  • 2
    Yes the requirement is valid, the only exception is with the `¨` diacritic on vowel characters, but it is common to not include it on capitalized words. – apod Jan 07 '10 at 13:38
  • @AakashM, @Jon Seigel: "When a word is written entirely in capital letters, diacritics are never used; the word Ἢ ("or") is an exception to this rule because of the need to distinguish it from the nominative feminine article Η." from http://tinyurl.com/yeack9o – Jakub Šturc Jan 07 '10 at 13:39

4 Answers4

4

I suspect that you're going to have to write your own method, if el-GR doesn't do what you want. Don't think you need to go to the full length of creating a custom CultureInfo, if this is all you need. Which is good, because that looks quite fiddly.

What I do suggest you do is read this Michael Kaplan blog post and anything else relevant you can find by him - he's been working on and writing about i18n and language issues for years and years and his commentary is my first point of call for any such issues on Windows.

Community
  • 1
  • 1
AakashM
  • 62,551
  • 17
  • 151
  • 186
2

I don't know much about ASP.Net but I know how I'd do this in Java.

If the characters are Unicode, I would just post-process the output from ToUpper with some simple substitutions, one being the conversion of \u038C (Ό) to \u039F (Ο) or \u0386 (Ά) to \u0391 (Α).

From the looks of the Greek/Coptic code page (\u0370 through \u03ff), there's only a few characters (6 or 7) you'll need to change.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • That's evil. What if the customer one day inputs such a character willingly? – Maximilian Mayerl Jan 07 '10 at 13:55
  • It's already been stated in the comments that uppercase Greek words never have diacritics except for the word Ἢ: "When a word is written entirely in capital letters, diacritics are never used; the word Ἢ ("or") is an exception to this rule" (see http://www.statemaster.com/encyclopedia/Diacritics-%28Greek-alphabet%29) – paxdiablo Jan 07 '10 at 14:30
  • So? I, as the customer, want the program not to change the uppercase I have already written myself, even if it contains such a diacritic. It's up to me what I want to save in a text field, and if your program changes my explicitly written text, I'm going to persuade your support department until they run crying out of the company building. ;) Well, maybe a little exeggerated, but my point is that that would be a quick and dirty solution which sould never be in a production system. – Maximilian Mayerl Jan 07 '10 at 14:46
  • @MM, I understand that, but you're *not* the customer, Jakub is, and he has stated that they must appear that way. If you want to ask a *different* question, do so by all means. – paxdiablo Jan 07 '10 at 15:17
2

Check out How do I remove diacritics (accents) from a string in .NET?

Community
  • 1
  • 1
Eduardo Molteni
  • 38,786
  • 23
  • 141
  • 206
2

How about replacing the wrong characters with the right ones:

/// <summary>
/// Returns the string to uppercase using Greek uppercase rules.
/// </summary>
/// <param name="source">The string that will be converted to uppercase</param>
public static string ToUpperGreek(this string source)
{
    Dictionary<char, char> mappings = new Dictionary<char, char>(){
        {'Ά','Α'}, {'Έ','Ε'}, {'Ή','Η'}, {'Ί','Ι'}, {'Ό','Ο'}, {'Ύ','Υ'}, {'Ώ','Ω'}
    };

    source = source.ToUpper();

    char[] result = new char[source.Length];
    for (int i = 0; i < result.Length; i++)
    {
        result[i] = mappings.ContainsKey(source[i]) ? mappings[source[i]] : source[i];
    }

    return new string(result);
}
sath
  • 93
  • 1
  • 8