9

I am stepping through two different blocks of code to learn the difference in execution between the switch and if statements.

While if seems to check every single condition, switch skips directly to the condition that evaluates to true. How does the compiler know which statement will evaluate to true pre-check? Please see below code:

public void IFvSwitch()
    {
        string x = "f";
        switch (x)  // Compiler skips directly to case "f":
        {
            case "a":
                MessageBox.Show(x);
                break;
            case "b":
                MessageBox.Show(x);
                break;
            case "f":
                MessageBox.Show(x);
                break;
        }

        if (x == "a") // Compiler checks all conditions 
        {
            MessageBox.Show(x);
        }
        else if (x == "b")
        {
            MessageBox.Show(x);
        }
        else if (x == "f")
        {
            MessageBox.Show(x);
        }
    }
Pholoso Mams
  • 467
  • 1
  • 5
  • 19
  • 3
    `switch` basically compiles to a type of table lookup. – juharr May 18 '18 at 13:26
  • @Berger, java vs c# sometimes things could have been different. And as the answers in the duplicate does not state that it's the same . It's hard to label it has dupe. Etiher a question about c# already exist or it's an 'original' question. – Drag and Drop May 18 '18 at 13:31
  • Consider this. In `if` you can write expressions of *arbitrary* complexity, so long as the overall expression produces a `bool`. Then look at the limitations on what you can put in `case`s. – Damien_The_Unbeliever May 18 '18 at 13:33
  • Possible duplicate of [Is there any significant difference between using if/else and switch-case in C#?](https://stackoverflow.com/questions/395618/is-there-any-significant-difference-between-using-if-else-and-switch-case-in-c) – Sudsy1002 May 18 '18 at 13:37
  • @Damien_The_Unbeliever: I see what you're getting at, but your explanation misses the mark a bit. You can still evaluate multiple things in a switch (e.g. `switch(a+b+c) { }`), which can in some cases cover the same need. But a switch with two outcomes (one of which is the default) is effectively no different from an if. – Flater May 18 '18 at 13:42
  • @juharr: It used to. With pattern matching, it's *logically* checking each condition in turn, even if *sometimes* it can be optimized. – Jon Skeet May 18 '18 at 13:44
  • @Flater - I said the *case*, not the *switch*. The switch can compute an arbitrary value. But then you *have* to compare that value *exactly* for equality with one of a number of items. You can't have a `case` label for `>4`, for instance. – Damien_The_Unbeliever May 18 '18 at 13:44
  • 1
    @Damien_The_Unbeliever: `case var _ when value > 4` in C# 7... – Jon Skeet May 18 '18 at 13:46
  • @DaisyShipton - yes, but then those `when`s get evaluated in sequence, just like the `if`s. For the basic `switch`/`case` that exhibits the behaviour the OP is asking about, you have to be performing equality tests only. – Damien_The_Unbeliever May 18 '18 at 13:47
  • @Damien_The_Unbeliever: Note that I said `which can in some cases cover the same need`. If OP is considering complex evaluations in their `case`, it's just as possible that they'll try to implement it in their `switch`. In reality, switches and ifs don't really take in complex evaluations; they only take in the **resulting** value after it is processed. There is no real difference between switches and ifs in this sense (other than the allowed types, since ifs are limited to bools). The only meaningful difference is that a switch allows for multiple outcomes, while an if is limited to (max) two – Flater May 18 '18 at 13:53
  • @Damien_The_Unbeliever: It's still a case label that's checking a non-constant, which you seemed to be saying wasn't possible. I think it's worth thinking about the implementation separately from the logical behaviour. Logically, every `case` is checked in order. From an implementation perspective, the compiler may very well have a lot of opportunities to optimize. – Jon Skeet May 18 '18 at 13:53
  • It depends. Simple switch statements with no more than 2 case statements is done with if-else. More than 2 produces [this](https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.switch(v=vs.110).aspx), jitted to a jump table at runtime. Being able to switching on a string was a recent addition to C#, optimized to switch on the String.GetHashCode() value. – Hans Passant May 18 '18 at 14:20
  • 1
    @DaisyShipton OP's code does not use the new pattern matching and we don't even know what version of C# they are using. TBH I don't know if they still compile non-pattern matching switches to table lookups with C# 7 or not. – juharr May 18 '18 at 14:31
  • @juharr: While it doesn't use pattern matching, it's easy to present a mental model that works for both "old style" cases *and* pattern-based cases... so I think that's the best thing to do. (And I believe that the compiler *does* optimize where it can, which may well be more than just "only if all the cases are constant".) – Jon Skeet May 18 '18 at 15:36

5 Answers5

15

While if seems to check every single condition, switch seems to skip directly to the condition that evaluates to true. How does the compiler know which statement will evaluate to true before pre-check?

Let's start by correcting your use of jargon. The C# compiler translates C# into IL. The runtime has a jit compiler that translates IL into machine code. The C# compiler also generates information that informs the debugger where it should be causing breaks when you are single-stepping.

So it's not exactly clear what your question is. Is the question "how does the C# compiler generate IL to avoid linear search?" or is it "how does the C# compiler inform the debugger that it should skip showing the search portion of the switch codegen?" Or is it "how does the jit compiler generate machine code for a switch?"

Tell you what, I'll handwave a little here and give you vague answers that describe the strategy but not the details. If you have questions about the details then post a more specific question. In particular, I'll use "the compiler" to mean either the C# compiler or the jit compiler, without saying which. Sometimes work is done by the C# compiler, sometimes it defers that work to the jitter, but either way, the work gets done.


Switch codegen is complicated. Let's start with the easy one. The compiler can choose to compile a switch as though it was a bunch of if-else statements, and in practice it often does. It particularly does so if the switch is very small, as in your example.

How does the debugger know to not show you the comparison steps that are really happening? The compiler also generates debugging information that informs the debugger what instructions should be skipped over and which ones should generate automatic breakpoints when single-stepping. So the C# compiler, the jit compiler and the debugger all work together to ensure that the developer has the right experience when stepping on a switch.


What happens if the switch is big? Doing a huge if-else to find the right switch section is potentially slow. Let's again do an easy case and look at a big switch full of strings. In that case the compiler will do the following:

  • Figure out the address of each switch section
  • Generate code which builds a static dictionary from string to int
  • The switch becomes "look up the string in the dictionary; if it is not there, go to the default case or the end of the switch if there is no default. If it is there, go to the address that is in the dictionary"

Now the work of quickly finding the string without checking every one of them is left up to the dictionary, which uses a hash table to cut down on the number of string comparisons by a huge factor.


What if we have a large switch but the cases are integers?

switch(whatever)
{  
  case 1000: … break;
  case 1001: … break;
  … many more …
  case 1999: … break;
  default: … break;
}

In this situation the compiler could make a static array of a thousand switch case addresses, and then generate code like this:

  • If the value is less than 1000 or more than 1999, go to the default case
  • Otherwise, look up the address by dereferencing the array, and go there.

So now we are down to two comparisons and an array dereference; this really does jump directly to the right section.


What about this?

switch(whatever)
{  
  case 1000: … break;
  case 1001: … break;
  … many more …
  case 1999: … break;
  case 2001: … break;
  default: … break;
}

There's no case 2000. In that case the compiler can generate a jump table as before, with 2002 entries, and in the 2000 slot, it just puts the address of the default section.

What about this?

switch(whatever)
{  
  case 1000: … break;
  case 1001: … break;
  … many more …
  case 1999: … break;
  case 1000001: … break;
  default: … break;
}

Now we are in an interesting case! The compiler will not generate a jump table with 1000002 entries, most of which are "go to the default case". In this case the compiler could generate a hybrid switch:

  • If the value is less than 1000 go to the default case
  • If the value is 1000001, go to its case
  • If the value is greater than 1999 go to the default case
  • Otherwise, check the jump table

And so on. I'm sure you can see how this can get very complicated when the switches are large and there are many dense ranges of values.

There has been a lot of work in the optimization community on what precisely is the best balance for creating hybrid switch strategies.

There is also lots of opportunity for things to go wrong. I'll end with a war story from the 1990s. The MSVC compiler back then -- I think it was version 4 -- had a bug where in my particular program, the hybrid jump table was generated incorrectly. Basically, it broke up a bunch of contiguous ranges into individual jump tables when it didn't need to.

Normally that would not be a problem, but this particular switch was the innermost loop of the logic in the Microsoft VBScript and JScript engines that dispatched the next instruction in the script. (VBScript and JScript at the time compiled to a proprietary bytecode language.) The bad switch was killing performance of the script engines.

The MSVC team could not prioritize fixing the bug fast enough for us, so we ended up writing heroic macro code that would let us express the switch statement in a way that looked reasonable to read, but behind the scenes the macros would generate the appropriate jump table!

Fortunately you can rely on the C# compiler and the jitter to generate good code for you. You shouldn't have to worry about what code is generated for a switch.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • The details here are precisely why I've broken my answer down into the logical model of "You can think of it as testing each case in turn" and "but the compiler may well be able to optimize" :) – Jon Skeet May 18 '18 at 15:38
3

Disclaimer
As others have commented, the underlying compiled IL is different from how you see your breakpoint move. Under the hood, it is actually checking each value one after the other.
However, your breakpoint doesn't show that.

Reading your question, I think the most important thing to notice here is that you're assuming your if else if else if structure to be one block; which it isn't. My answer focuses on this incorrect expectation from a logical perspective, rather than digging into the IL.

You're missing an important clue here:

    if (x == "a") // Compiler checks all conditions 
    {
        MessageBox.Show(x);
    }
    else if (x == "b")
    {
        MessageBox.Show(x);
    }
    else if (x == "f")
    {
        MessageBox.Show(x);
    }

These are all separate if statements, which are stacked one after the other (in each other's else case). They are executed separately because they are separate steps!

Your example has very similar if evaluations, but you could've done wildly different things here:

    if (x == "a") // Compiler checks all conditions 
    {
        MessageBox.Show(x);
    }
    else if (IsElvisStillAlive())
    {
        MessageBox.Show(x);
    }
    else if (sonny + cher == love)
    {
        MessageBox.Show(x);
    }

One evaluation says nothing about the next evaluation. They are not connected in any way.


Note that the indentation (and omitting curly braces) can help showcasing that point. Your indentation hides the fact that these are nested steps.

    if (x == "a") // Compiler checks all conditions 
    {
        MessageBox.Show(x);
    }
    else 
    {
        if (x == "b")
        {
            MessageBox.Show(x);
        }
        else 
        {
            if (x == "f")
            {
                MessageBox.Show(x);
            }
        }
    }

This revised indentation, while needlessly verbose (although some like it that way), is much clearer on explaining why these are separate steps.


Notice how you're not asking how an if knows how to go to the first or second case. That's fairly obvious.
A switch is, effectively, an if with more than two outcomes.


Think of it this way: A switch only needs to take one step:

                **********
                * SWITCH *
                **********
     _________ _/ |    |  \ _________      
    /             |    |             \  
********    ********  ********    ********
* CASE *    * CASE *  * CASE *    * CASE *
********    ********  ********    ******** 

But your if falls through 3 separate if cases:

        ******                             
        * IF *                             
        ******  
    ______|______
    |            |
********    ********  
* THEN *    * ELSE *  
********    ********    
                 |
               ******                      
               * IF *                      
               ******  
           ______|______
           |           |
       ********    ********  
       * THEN *    * ELSE *  
       ********    ********      
                      |
                   ******                  
                   * IF *                  
                   ******  
               ______|______
              |             |
           ********    ********  
           * THEN *    * ELSE *  
           ********    ********  

You're right that they syntactically look very similar, but they are very different when you observe their logical flow.

Flater
  • 12,908
  • 4
  • 39
  • 62
  • 1
    How you made those "switch | case" and "if | else" diagrams !! – Yogi May 18 '18 at 13:41
  • 4
    @Yogi: I'm a 90's kid, ASCII art was my jam for a few years :) Basically, open notepad, make a bunch of lines filled with spaces, press [Insert] (to toggle to **overwrite** mode), and then start creating. It takes a lot of fidgeting. After a while, you get a reflex for it. – Flater May 18 '18 at 13:44
  • "A switch only needs to take one step" - that's not the case for switches with patterns which may match more than one case; those need to be checked in order. – Jon Skeet May 18 '18 at 13:45
  • @DaisyShipton: Fair point, but that's not what OP is focusing on here. – Flater May 18 '18 at 13:47
  • @Chris: It's literally impossible to know if something matches an element (from a list of possible elements) without checking this list's elements one by one . If you're looking at it on such a microscopic level, then _everything_ is sequential at some point. But this isn't what OP is focusing on. He's asking why his **breakpoint** skips ahead in a `switch` case. He's not asking about IL. – Flater May 18 '18 at 13:49
  • @Flater: I guess I read the question as him wanting to know how the switch worked rather than how the debugger worked but I see that is just as valid a way of looking at the question. You have persuaded me to give you a +1. ;-) – Chris May 18 '18 at 13:53
  • @Flater: While it's not what the OP is focusing on here, I think it makes sense for them to adopt a logical model that doesn't need to be changed when they encounter a case with a non-constant pattern in. Let's save people learning C# from having to change models :) – Jon Skeet May 18 '18 at 13:55
  • 1
    @Chris: I added a disclaimer to mention your interpretation. It's a bit ambiguous: OP is assuming that the IL and the breakpoint follow the same logical progression; which is not the case. Removing that notion; whether he wants to focus on one or the other is not obvious from the question. – Flater May 18 '18 at 14:05
  • I've just updated my answer after reading up on it some more - it seems that the comparing each in turn isn't what it *always* does, just what it did in this case. Sometimes it uses a dictionary internally so in fact does do exactly what the OP said and jumps directly to the right branch of code (having executed a bunch more code to be able to do that). Apparently compilers are complicated and clever. :) – Chris May 18 '18 at 14:06
  • @Chris: `Apparently compilers are complicated` Understatement of the century right there :) – Flater May 18 '18 at 14:07
  • @Flater: I think we'll have to agree to disagree there. I don't want to put a mistake *deliberately* in the way of someone learning. – Jon Skeet May 18 '18 at 14:22
  • @Flater: I'm talking about giving users a model which isn't general-purpose, knowing that they'll need to change their mental model later. How is that better than giving a correct general-purpose mental model which works in all cases? – Jon Skeet May 18 '18 at 14:30
  • @DaisyShipton: "a correct general-purpose mental model which works in all cases" is a loaded concept. If this alleged model were objectively better than others in all cases; then there would never be a reason to use anything _but_ this model. More often than not, different approaches exist because they have different benefits and drawbacks. So if this general purpose model only applies in _some_ cases and not _all_ cases (where another model is better), then it's best to teach users _both_ models, including understanding when to use which. – Flater May 18 '18 at 14:38
  • @Flater *It's literally impossible to know if something matches an element (from a list of possible elements) without checking this list's elements one by one* But that's exactly what a *classic* C# `switch` does. Or at least *can* do. – Rawling May 18 '18 at 14:41
  • @Rawling: I think you're misinterpreting what I said, or how I intended it. As a simple example, comparing two int values **sequentially** compares the int valuess **bit by bit**. But that is overly microscopic when we're talking about most use cases of int comparison. For the purposes of everyday coding, an int evaluation can be considered as a single logical step, regardless of the existence of an underlying sequence of bit evaluations. Similarly, my answer focuses on the top level logic, not the underlying IL (as per Chris' comment); hence my reply to him. – Flater May 18 '18 at 14:45
  • @Flater: So what's the benefit **in this concrete situation** of having two different models? Do you think all C# developers should learn the implementation details of what the C# compiler can optimize in what situations? – Jon Skeet May 18 '18 at 15:41
  • @DaisyShipton: My answer never delved into this. OP never talked about a `switch` with `case when`. You're the only one who brought raised the topic. This question doesn't even have a concrete situation (`x == "a"` isn't particularly meaningful); so I'm wondering what concrete situation you're referring to. I can also turn the question on you: do _you_ think that compiler optimization is the only reason why a developer would need to pick a particular pattern? This is a discussion that doesn't add to the posted question. – Flater May 18 '18 at 15:50
  • @Flater: I'm referring to the concrete situation of "Describe what a switch statement does." But I agree that our disagreement isn't adding to anything, and it doesn't look like I'm going to convince you, so I won't comment any further. It might at least be worth noting in your answer that your general statement of "A switch only needs to take one step" is *not* universally true, even if you don't go into any more detail than that. – Jon Skeet May 18 '18 at 15:56
2

In linqpad 4 I compiled the following method to look at the underlying IL:

void sw()
{
        string x = "f";
        switch (x)  // Compile skips directly to  case "f":
        {
            case "a":
                Console.WriteLine(x);
                break;
            case "b":
                Console.WriteLine(x);
                break;
            case "f":
                Console.WriteLine(x);
                break;
        }
}

The IL generated was as follows:

IL_0000:  nop         
IL_0001:  ldstr       "f"
IL_0006:  stloc.0     // x
IL_0007:  ldloc.0     // x
IL_0008:  stloc.1     // CS$4$0000
IL_0009:  ldloc.1     // CS$4$0000
IL_000A:  brfalse.s   IL_0050
IL_000C:  ldloc.1     // CS$4$0000
IL_000D:  ldstr       "a"
IL_0012:  call        System.String.op_Equality
IL_0017:  brtrue.s    IL_0035
IL_0019:  ldloc.1     // CS$4$0000
IL_001A:  ldstr       "b"
IL_001F:  call        System.String.op_Equality
IL_0024:  brtrue.s    IL_003E
IL_0026:  ldloc.1     // CS$4$0000
IL_0027:  ldstr       "f"
IL_002C:  call        System.String.op_Equality
IL_0031:  brtrue.s    IL_0047
IL_0033:  br.s        IL_0050
IL_0035:  ldloc.0     // x
IL_0036:  call        System.Console.WriteLine
IL_003B:  nop         
IL_003C:  br.s        IL_0050
IL_003E:  ldloc.0     // x
IL_003F:  call        System.Console.WriteLine
IL_0044:  nop         
IL_0045:  br.s        IL_0050
IL_0047:  ldloc.0     // x
IL_0048:  call        System.Console.WriteLine
IL_004D:  nop         
IL_004E:  br.s        IL_0050
IL_0050:  ret         

Broadly the bit you are interested in starts at IL_000C where it loads up the switch variable, then loads the static value "a" (IL_000D), compares them (IL_0012) and then if true jumps to IL_0035 (IL_0017). It then repeats this for each case statement. After the last case statement it jumps to the end (skipping past all the code that was in each case) (IL_0033).

So your observation that "'Switch' seems to skip directly to the condition that evaluates to true" isn't actually true. When stepping through a debugger it may seem like this but this is not representing how the underlying compiled code works.

I should note that this is now always how the switch statement will be compiled, just that this is how this particular one is getting compiled.

If you increase the number of case statements then it will implement it using a jump table in this case it creates a private dictionary with your case strings as the key and then an index as the value. It then looks up the switch variable in that dictionary and uses that in a jump table to work out where in the IL to move execution to. So in this case it would be able to go directly to the right branch in the same way that when looking up a value in a dictionary it doesn't need to check every single name/value pair.

Is there any significant difference between using if/else and switch-case in C#? (thanks to @Sudsy1002 for linking it) discusses some of these things in more detail.

Chris
  • 27,210
  • 6
  • 71
  • 92
  • Sudsy's comment about a possible dupe suggested that the compiler may do different things with larger switches and may do different things in debug vs release. Are you saying that the compiler will now always go down the chained evaluation route these days? – Damien_The_Unbeliever May 18 '18 at 13:51
  • @Damien_The_Unbeliever: I'll confess to not knowing enough about the compiler to say that. I was just demonstrating what this particular case does. I'll edit my answer to make that clearer. This one exmaple is however enough to contradict the assertion that "the switch skips straight to the right branch" which was the point I was trying to make. – Chris May 18 '18 at 13:52
  • @Damien_The_Unbeliever: In fact having just played around a bit if I increase the number of case statements in my switch (to 8 in my test) then it did use a jump table. I'll update again. :) – Chris May 18 '18 at 13:59
1

TL;DR: Logically, each case in the switch statement is checked in turn, and the first one to match "wins". As an implementation detail, the compiler is very often able to optimize this.

When every "case" is a constant, the compiler may well be able to generate a jump table to optimize things. That will always logically give the same result as checking each case in turn. How it creates that table may well depend on the switch type - switching over integers is simpler than switching over strings, for example.

When cases contain patterns as introduced in C# 7, the compiler may not be able to create a jump table. Here's an example to prove that multiple case labels can be checked in a single switch statement:

using System;

class Program
{
    public static void Main()
    {
        object obj = 200;
        switch (obj)
        {
            case int x when Log("First when", x < 10):
                Console.WriteLine("First case");
                break;
            case string y when Log("Second when", y == "This won't happen"):
                Console.WriteLine("Second case");
                break;
            case int z when Log("Third when", true):
                Console.WriteLine("Third case");
                break;                
        }
    }

    static bool Log(string message, bool result)
    {
        Console.WriteLine(message);
        return result;
    }
}

The output is:

First when
Third when
Third case

The "Second when" message isn't logged because the start of the pattern isn't matched: we don't get as far as the guard clause (the when). But execution may still have had to check that. It's possible that the implementation would effectively switch on the type, and know that if obj is an int, it only needs to check the first and third patterns.

Basically, as of C# 7, the compiler has a lot of latitude to optimize - and is very, very likely to do that when all the case labels are constants - but logically it needs to check one case at a time.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
0

I remember someone saying that a switch with string cases internally gets converted to an if-else structure. Basically saying that it's just syntactic sugar.

switch statements with ints/enums might actually work differently (and thus faster than if/else)

ManIkWeet
  • 1,298
  • 1
  • 15
  • 36