4

TL;DR: Can I somehow create an algorithm that can use different functionality in the inner loop, and still get that "functionality" inlined, without resorting to copy/paste or if/else statements?

I am trying to create an algorithm that basically looks like this:

for(var i=0; i<big; i++) {
  for(var j=0; j<big2; j++) {
     // ... processing
     var x = SomeFunc(a, b, c);
     // ... more processing
  }
}

I want the algorithm to be run for a number of possible functions (SomeFunc above), each called a large number of times per run. Each SomeFunc function is very simple (usually an arithmetic expression).

Now, to get acceptable performance out of this algorithm, it is imperative that SomeFunc is inlined. However I fail to get the function inlined while still allowing for multiple functions.

I realize this means that the algorithm function has to be JITted multiple times, but I was hoping that a construct like this would work:

interface ISomeFunc {
   int SomeFunc(int a, int b, int c);
}

private sealed class SomeFunc1 : ISomeFunc {
   public int SomeFunc(int a, int b, int c) {
     return ....;
   }
}


private static void RunMyGenericAlgo<T>(T func) where T : ISomeFunc
{
    for ... for ..
       x = func.SomeFunc(a, b, c);
}

But it appears that the function call is not inlined since func above is called via the interface and not via the sealed class.

I also tried the obvious approach:

abstract class MyAlgo {
   protected abstract int SomeFunc(int a, int b, int c);
   public void Run() {
     // as RunMyGenericAlgo above
   }
}

sealed class MyAlgoSomeFunc1 : MyAlgo {
   protected override int SomeFunc(int a, int b, int c) {...}
}

and it did not inline either.

This program will however inline as desired (and runs about 50% faster):

class MyAlgo {
   int SomeFunc(int a, int b, int c) {...}
   public void Run() {
     // as above
   }
}

EDIT: To clarify, I also investigated using the MethodImpl attribute with AggressiveInlining, and it did not seem to help.

Is what I'm trying even possible in C# or do I have to copy/paste my algorithm for each implementation of my inner function?

Krumelur
  • 31,081
  • 7
  • 77
  • 119
  • 1
    possible duplicate of [Inline functions in C#?](http://stackoverflow.com/questions/473782/inline-functions-in-c) – dav_i Aug 12 '14 at 10:38
  • 3
    The code you've given isn't even valid, as `SomeFunc1.SomeFunc` would need to be public (or explicitly implement the interface). I'd advise you to use delegates though. And no, this doesn't mean the algorithm would be JITted several times... it would do if `SomeFunc1` were a struct.) Do you have any indication that inlining is *really* significant in this case? – Jon Skeet Aug 12 '14 at 10:38
  • @dav_i I don't think that question is a duplicate, as it deals with inlining in general (and seems to confuse inlining with lambdas). – Krumelur Aug 12 '14 at 10:41
  • @JonSkeet Yes, I have made measurements. Inlining (copy+pasting code and using nonvirtuals only) saves about 50% wall clock time. Sorry about the pseudocode, I tried to keep it only as an illustration. My test program works, of course, but it is too long to post. – Krumelur Aug 12 '14 at 10:43
  • 1
    @Krumelur - I thing "50% wall clock time" is way off the mark. The difference should be very little. – Enigmativity Aug 13 '14 at 07:42
  • 50% is way over the mark. Are you using arrays inside SomeFunc or something? Is it possible that the "manually inlined" method takes advantage of some other optimization? It might be that the parameters are passed through the stack rather than through registers, if you really aren't doing much work, that might lead to such huge difference. However, in that case, repeating the cycle code is probably worth it - if it does make enough of a difference, just go for it. – Luaan Aug 13 '14 at 07:55
  • The inner functions are really tiny, and called a *lot*. I don't think 50% is way off the mark, since each inner function will probably only JIT to a few instructions. – Krumelur Aug 17 '14 at 11:05

1 Answers1

2

To allow a method to be inlined, the implementation must be constant (e.g. not dependant on variables). So any form of a virtual/abstract/interface/delegate call is by definition a call that can never be inlined. Therefore, the only way is to have a nonvirtual method call, or just paste the code in there.

There are some exceptions to this rule. For example, the JVM designers have the problem of all the Java methods being virtual by default, they have virtual call inlining. This will do something like:

//calling obj.MyVirtCall();
if (obj.Type == MyCommonType) {
    //inlined code for MyCommonType.MyVirtCall();
} else {
    obj.MyVirtCall();
}

Edit: You could used T4 templates to generate C# code for each override of Run in your algorithm example, not requiring duplicate code, however it could make the maintenance slightly more complex, having to also maintain a T4 template instead of just C# code.

Bas
  • 26,772
  • 8
  • 53
  • 86
  • 1
    Actually, .NET does that too. And just like in Java, it only really works when one of the implementations clearly dominates (I think the figure was something like 500:1 calls to the specific method vs. virtual or so). – Luaan Aug 13 '14 at 07:50
  • Copy/paste it is, then. – Krumelur Aug 17 '14 at 11:04