7

I run into this scenario frequently. At first glance, I think, “That’s poor coding; I’m executing a method twice and necessarily getting the same result.” But upon thinking that, I have to wonder if the compiler is as smart as I am and can come to the same conclusion.

var newList = oldList.Select(x => new Thing {
    FullName = String.Format("{0} {1}", x.FirstName, x.LastName),
    OtherThingId = x.GetOtherThing() != null : x.GetOtherThing().Id : 0 // Might call x.GetOtherThing() twice?
});

Does the behavior of the compiler depend on the contents of the GetOtherThing method? Say it looks like this (somewhat similar to my real code right now):

public OtherThing GetOtherThing() {
    if (this.Category == null) return null;
    return this.Category.OtherThings.FirstOrDefault(t => t.Text == this.Text);
}

That will, barring very poorly handled asynchronous changes to whatever store these objects are coming from, definitely return the same thing if run twice in a row. But what if it looked like this (nonsensical example for the sake of argument):

public OtherThing GetOtherThing() {
    return new OtherThing {
        Id = new Random().Next(100)
    };
}

Running that twice in a row would result in the creation of two different objects, with different Ids in all likelihood. What would the compiler do in these situations? Is it as inefficient as it seems to do what I showed in my first listing?

Doing some of the work myself

I ran something very similar to that first code listing and put a breakpoint in the GetOtherThing instance method. The breakpoint was hit once. So, it looks like the result is indeed cached. What would happen in the second case, where the method might return something different each time? Would the compiler optimize incorrectly? Are there any caveats to the result that I found?

EDIT

That conclusion was invalid. See comments under @usr’s answer.

Andrew
  • 4,145
  • 2
  • 37
  • 42
  • I believe the method result is stored on the local call stack and referenced rather than calling the method twice, when compiler optimizations are enabled. – Haney Feb 18 '14 at 20:53
  • @DavidHaney Why do you believe that? Your talking about the stack describes how the IL might look like if the optimization was performed, but not whether this optimization is valid and possible. –  Feb 18 '14 at 20:55
  • As you said, the method result can be different between invocation, and because of that compiler can't optimize it to just one call. However, it really likely that the optimization will be performed by JIT when you actually run the code. – MarcinJuraszek Feb 18 '14 at 20:56
  • @DavidHaney that sounds like an unlikely and behavior-affecting optimization. Can you provide a citation for that? – bzlm Feb 18 '14 at 20:56
  • @MarcinJuraszek *Why* is it likely the JIT will perform this optimization? While the JIT can easily inline the methods involved, I don't see a way for it to rule out concurrent modification of `x.Category.OtherThings`, which would affect the result. –  Feb 18 '14 at 20:57
  • See this for some explanation. As far as I know the compiler will often reduce `ternary` to `if-else` equivalent. http://stackoverflow.com/a/17330596/2420979 – Haney Feb 18 '14 at 21:01
  • @delnan Such optimisations are allowed, even when neither the compiler nor the JITter can prove that no other thread will make any relevant modifications. You need some form of thread synchronisation to prevent that. (That said, I certainly don't intend to claim that the optimisation *will* be performed.) –  Feb 18 '14 at 21:07
  • @delnan Check this answer: http://stackoverflow.com/a/20832053/1163867 JIT does a lot of optimizations, I would expect it to optimize your repeated call as well. – MarcinJuraszek Feb 18 '14 at 21:07
  • @MarcinJuraszek I'm perfectly aware of CSE. But CSE, like virtually all optimizations, can only apply if some invariants hold (e.g. the duplicated expression has no side effects and evaluates to the same thing in both places) to avoid changing the program's behavior. If an optimizer can't prove these invariants, it won't perform the optimization. –  Feb 18 '14 at 21:11
  • @hvd Possible, I'm not very familiar with the CLR's memory model. The JIT would still have to inline *all* intervening code involved to rule out modification from the same thread, but that's much more feasible and has other advantages too. –  Feb 18 '14 at 21:13
  • @delnan: Thanks for the feedback on my (now deleted) answer. I do appreciate it. – Robert Harvey Feb 18 '14 at 21:15
  • Why the downvote? I thought this was a well-formed question with clear examples, and it demonstrated initiative toward figuring out an answer. – Andrew Feb 18 '14 at 22:07

2 Answers2

12

There are two compilers to consider here: the C# compiler that turns C# into IL, and the IL compiler that turns IL into machine code -- called the jitter, because it happens Just In Time.

The Microsoft C# compiler certainly does no such optimization. Method calls are generated as method calls, end of story.

The jitter is permitted to perform the optimization you describe provided that doing so cannot be detected. For example, suppose you had:

y = M() != 0 ? M() : N()

and

static int M() { return 1; }

The jitter is permitted to turn this program into:

y = 1 != 0 ? 1 : N()

or for that matter

y = 1;

Whether the jitter does so or not is an implementation detail; you'll have to ask an expert on the jitter whether it actually does perform this optimization if you care.

Similarly, if you had

static int m;
static int M() { return m; }

then the jitter could optimize that into

y = m != 0 ? m : N()

or even into:

int q = m;
y = q != 0 ? q : N();

because the jitter is permitted to turn two field reads in a row with no intervening write into a single field read, provided that the field is not volatile. Again whether it does so or not is an implementation detail; ask a jitter developer.

However, in your latter example the jitter cannot elide the second call because it has a side effect.

I ran something very similar to that first code listing and put a breakpoint in the GetOtherThing instance method. The breakpoint was hit once.

That is highly improbable. Almost all optimizations are turned off when you are debugging, precisely so that it is easier to debug. As Sherlock Holmes never said, when you eliminate the improbable, the most likely explanation is that the original poster was mistaken.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
9

The compiler can only apply optimizations if you cannot tell the difference. In your "random" example you can clearly tell the difference. It cannot be "optimized" this way. It would violate the C# spec. In fact the spec does not talk about optimizations a lot. It just says what you should observe the program do. In this case, it specifies that two random numbers should be drawn.

In the first example, it might be possible to apply this optimization. It will never occur in practice. Here are some things that make it hard:

  • The data the query operates on could be changed by a virtual function call of yours, or your lambda (t => t.Text == this.Text) could change the list. Very insidious.
  • It could be changed by another thread. I'm not sure what the .NET memory model says about this.
  • It could be changed by reflection.
  • It must be provable that the computation will always return the same value. How would you prove that? You would need to analyze all code that could possibly run. Including virtual calls and data-dependent control-flow.

All of this has to work across non-inlined methods and across assemblies.

The C# compiler cannot do this because it cannot look into mscorlib. A patch release might change mscorlib at any time.

The JIT is a poor JIT (alas) and it is optimized for speed of compilation (alas). It does not do this. If you are in doubt whether the current JIT will do some advanced optimization or not, it is a safe bet that it won't.

usr
  • 168,620
  • 35
  • 240
  • 369
  • 1
    This sounds like a very good answer, so how would you explain the fact that my breakpoint was only hit once? It would seem that the optimization did take place. – Andrew Feb 18 '14 at 21:25
  • 1
    Maybe you went to the `: 0` branch, or you made some other mistake. Place a `Console.WriteLine` call into all 3 places (`Write() ? Write() : Write()` for some useful Write function). You'll always see two calls. In Release mode sometimes breakpoints are not hit due to optimizations (unrelated to the alleged optimization discussed here). – usr Feb 18 '14 at 21:29
  • 1
    Oh, duh, you're right, it was the "false" side of the ternary statement in this case. Sadly my code structure has changed in the past half hour so it's not conducive to test this further at the moment. I'll take your word for it and the next time this comes up I'll be sure to test it more. Thanks! – Andrew Feb 18 '14 at 22:02