Why ToUpper is faster than ToLower?

Question

Stopwatch stopwatch1 = new Stopwatch();
Stopwatch stopwatch2 = new Stopwatch();
string l = "my test";
string u = "MY TEST";

for (int i = 0; i < 25; i++)
{
    l += l;
    u += u;
}

stopwatch1.Start();
l=l.ToUpper();
stopwatch1.Stop();

stopwatch2.Start();
u=u.ToLower();
stopwatch2.Stop();

// Write result.
Console.WriteLine("Time elapsed: \nUPPER :  {0}\n LOWER : {1}",
                  stopwatch1.Elapsed, stopwatch2.Elapsed);

I have run many times:

UPPER : 00:00:01.3386287
LOWER : 00:00:01.4546552

UPPER : 00:00:01.1614189
LOWER : 00:00:01.1970368

UPPER : 00:00:01.2697430
LOWER : 00:00:01.3460950

UPPER : 00:00:01.2256813
LOWER : 00:00:01.3075738

in the first instance, you should time more than one conversion! try like 10000 — Mitch Wheat, Dec 20 '16 at 08:53
You should use ToUpperInvariant and ToLowerInvariant. And you use BenchmarkDotNet to do tests. — Jnavero, Dec 20 '16 at 08:59
[Upper vs Lower Case](http://stackoverflow.com/questions/234591/upper-vs-lower-case), — Sen Jacob, Dec 20 '16 at 09:01
[This might be the answer you are looking for](http://stackoverflow.com/questions/9033/hidden-features-of-c#12137). "When normalizing strings, it is highly recommended that you use ToUpperInvariant instead of ToLowerInvariant **because Microsoft has optimized the code for performing uppercase comparisons**." — Sen Jacob, Dec 20 '16 at 09:02
@SenJacob reverse the operations and you'll see that the first one is still the fastest. Timings of single calls have no meaning — Panagiotis Kanavos, Dec 20 '16 at 09:04
I've reopened the question since actually `ToUpper` *is not* faster than `ToLower`, at least on the data provided. — Dmitry Bychenko, Dec 20 '16 at 09:33
if Microsoft has optimized the code for performing uppercase comparisons is it because the ASCII code for uppercase letters only two digits 65 - 90 while ASCII code Lowercase letters 97 -122 which contains 3 digits (need more processing) — Medo Medo, Dec 20 '16 at 09:43
Why do you think it needs more processing? It are still the same bytes, right? — Patrick Hofman, Dec 20 '16 at 10:00
@SenJacob: Your comment is refering to `ToUpperInvariant`. That is a small but very important difference. The invariant conversion is not the same as the culture-specific conversion of `ToUpper`. I would also dispute the optimization towards `ToUpperInvariant` as compared to `ToLowerInvariant`. That is basically a difference between performing a `+` and a `-`. Unless your source makes a compelling argument towards such a claim, I would doubt its correctness. — Sefe, Dec 20 '16 at 12:23

Dmitry Bychenko · Answer 1 · 2016-12-20T10:01:29.370

6

Let's try reproducing the result

  // Please, notice: the same string for both ToUpper/ToLower
  string GiniPig = string.Concat(Enumerable
    .Range(1, 1000000) // a million chunks "my test/MyTest" combined (long string)
    .Select(item => "my test/MY TEST"));

   Stopwatch sw = new Stopwatch();

   // Let's try n (100) times - not just once
   int n = 100;

   var sampling = Enumerable
     .Range(1, n)
     .Select(x => {
        sw.Reset();
        sw.Start();

        GiniPig.ToLower(); // change this into .ToUpper();

        sw.Stop();
        return sw.ElapsedMilliseconds; })
     .ToSampling(x => x); // Side library; by you may save the data and analyze it with R

   Console.Write(
     $"N = {n}; mean = {sampling.Mean:F0}; std err = {sampling.StandardDeviation:F0}");

Having run several times (warming) I've got the results (Core i7 3.6 GHz, .Net 4.6 IA-64):

ToLower: N = 100; mean = 38; std err = 8
ToUpper: N = 100; mean = 37; std err = 9

So you can't reject the null hypothesis that ToLower is as faster as ToUpper and thus your experiment has got errors:

You have different strings to process
Processing short (175 characters only) string just once (not in a loop) should be instant and thus the errors can be enourmous
You have to warm up the routine (in order methods to be compiled, assemblies loaded, caches filled up etc.)

It seems (the time elapsed is more than 1 second for a very easy operation) it's rule #3 (warming up) breakage which ruined the experiment

edited Dec 20 '16 at 10:01

answered Dec 20 '16 at 09:10

Dmitry Bychenko

180,369
20
160
215

as people comment " Microsoft has optimized the code for performing uppercase comparisons ." Is it because the ASCII code for uppercase letters only two digits 65 - 90 while ASCII code Lowercase letters 97 -122 which contains 3 digits (need more processing)? – Medo Medo Dec 20 '16 at 09:44
3

@Medo Medo: both `65` and `122` are ether *single* `char` or *single* `byte`; that's why the time (in your example) will be the same (CPU doesn't operate with *digits*, but with `byte`s `int`s `long`s etc. ) – Dmitry Bychenko Dec 20 '16 at 09:53
When you reproduce the result, are you sure you are on the same culture as the OP? Check my reply. – Sefe Dec 20 '16 at 10:28
@Sefe: The experiment has been carried out on pure *Ascii* text; so culture should not be the case. If only the text contained culture specific letters (say a string partially on Russian, partially on Georgian and on Chinese) the culture could have been the key feaure, and it might have been ocurred that in such conditions `ToUpper` is faster/slower than `ToLower` – Dmitry Bychenko Dec 20 '16 at 11:18
Well, the logic of converting characters from one case to another is at least partially being executed by culture-specific code. Regardless of the input values, there can be differences in the respective implementation that case different performance. – Sefe Dec 20 '16 at 11:19
1

@Sefe: I've tried the same experiment for the cultures `"en-US"`, `"de-De"` (German), `"ka-Ge"` (Georgian), `"ru-Ru"` (Russian) I've got the same results mean `38` with `9` standard error – Dmitry Bychenko Dec 20 '16 at 11:52

Sefe · Answer 2 · 2016-12-21T07:03:19.713

3

Your intial hypothesis of ToUpper being faster than ToLower has a logical fallacy.

Your conversions are culture-sensitive. You are not performing an ordinal operation, you are performing an operation dependent on your current culture (as returned by CultureInfo.CurrentCulture). A conversion from lower case to upper case might be faster in the culture that you are using and it might be slower in another. A conversion in one culture might be also faster than a conversion in another culture.

So your initial assumtion that there is one performance for ToUpper and ToLoweris false.

edited Dec 21 '16 at 07:03

answered Dec 20 '16 at 10:23

Sefe

13,731
5
42
55

1

Yes, the experiment to be properly carried out should specify culture; Something like "why `ToUpper(new CultureInfo("ka-Ge"))` significally `faster then ToLower(new CultureInfo("ka-Ge"))` ... ". And there're two possible anwers (for the culture specified): "...because Microsoft has optimized the code for performing uppercase comparisons..." and "this is culture specific behaviour" – Dmitry Bychenko Dec 20 '16 at 12:14

Why ToUpper is faster than ToLower?

2 Answers2