1

I have the following code:

 string firstMicro = "aa \u00b5 bb";
 string secondMicro = "aa \u03bc bb";

 Assert.IsFalse(firstMicro == secondMicro);

 string upperFirstMicro = firstMicro.ToUpper();
 string upperSecondMicro = secondMicro.ToUpper();

 Assert.IsFalse(upperFirstMicro == upperSecondMicro);

In my case, the first test passes (obviously, both strings are different), but in second case, the test fails because both texts are identical ($AA M BB). I admit that in one of the cases I should have used CultureInfo - but at least in the first case (the micro sign from ASCII code) should have stayed the same ... Can someone help me understand why is this happening?

Aparently, there is another thread here - How to correctly uppercase Greek words in .NET?, but there is no obvious answer ...

Thanks. D.

Community
  • 1
  • 1
dcg
  • 1,144
  • 1
  • 22
  • 38
  • `ToUpper` uses the current `CultureInfo` (unless passing culture info). What is your `current `CultureInfo`? – Aliostad Mar 08 '12 at 11:37
  • 1
    [What's the rationale for the second test?](http://stackoverflow.com/a/9617044/7724) What would you expect the resulting two upper-case strings to actually be? Would you expect the U00b5 to be unaltered since it's "not really a letter"? Also, ASCII is not involved here. – bzlm Mar 08 '12 at 11:37
  • And why have you **not** used `CultureInfo`? If you know its the correct thing to do? – Oded Mar 08 '12 at 11:37
  • @Oded CultureInfo or not doesn't really matter here, does it? The second assertion is incorrect as per the Unicode specifications. – bzlm Mar 08 '12 at 11:38
  • @bzlm - Fair enough, didn't check the spec first, but the OP states that he "should have used CultureInfo". I was inquiring as to why he didn't. – Oded Mar 08 '12 at 11:40
  • @bzlm - yes, I'm (or 'was') expecting to have the same character for "\u00b5" especially considering that it's not 'part' of greek alphabet. It's initial purpose was totally different. – dcg Mar 08 '12 at 12:13
  • @Oded - I cannot set a given culture info because I am using it in a search in an international application with a multilingual backend db. – dcg Mar 08 '12 at 12:20

2 Answers2

5

A microsecond is still a µSEC after upper-casing. Having it upcased to MSEC would fatally alter its meaning. Which is why there are two codepoints for the glyph.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • problem has been already answered and I think the guy who did it is right. in my case, (micro) has 2 'meanings': phisical symbol as you mentioned and the greek letter. Both has mappings in unicode table which points to the same character. – dcg Apr 28 '12 at 09:23
  • I wasn't convinced by the "Sorry, but that's how Unicode is defined" answer, so I posted my own. That's perfectly okay at SO. – Hans Passant Apr 28 '12 at 09:31
  • I agree with Hans, clearly the 0xB5 should not have an upper case letter nor classified as Lower case Letter (Ll), but as Po (Punctuation other). IMO it's a bug in Unicode. – Panos Theof Jan 07 '14 at 17:45
4

Some lowercase letters have the same uppercase equivalent. Sorry, but that's how Unicode is defined.

For example, as you can see in the official UnicodeData.txt, both U+0069 (i) and U+0131 (ı) have U+0049 (I) for uppercase.

What is the exact problem you're having? Maybe we can help.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
  • 1
    [Bad example.](http://www.fileformat.info/info/unicode/char/0069/index.htm) [Turkish `CultureInfo` doesn't have U+0049 for uppercase for i.](http://stackoverflow.com/a/3550226/7724) – bzlm Mar 08 '12 at 11:45
  • Fair enough; I should have said "in the absence of any culture info". And that data file I linked to does have problems of its own... for instance, it says that the lowercase of `U+1E9E (ẞ)` is `U+00DF (ß)`. But `U+00DF (ß)` does not have an uppercase equivalent! Still, it's official. – Mr Lister Mar 08 '12 at 11:52
  • [The case folding document is also official.](ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt) :) – bzlm Mar 08 '12 at 11:53
  • @Mr Lister - yes, that's the problem I am having. Thanks for your answer. It's clear now. – dcg Mar 08 '12 at 12:21