What is the overhead for a method call in a loop?

Question

I’ve been working on a C# maze generator for a while which can generate mazes of like 128000x128000 pixels. All memory usage is optimized already so I’m currently looking at speeding the generation up.

A problem (well more off an interest point) I found was the following (just some example code to illustrate the problem):

This code runs in about 1.4 seconds on my machine when pixelChanged is null:

public void Go()
{
    for (int i = 0; i < bitjes.Length; i++)
    {
        BitArray curArray = bitjes[i];
        for (int y = 0; y < curArray.Length; y++)
        {
            curArray[y] = !curArray[y];
            GoDrawPixel(i, y, false);
        }
    }
}

public void GoDrawPixel(int i, int y, Boolean enabled)
{
    if (pixelChanged != null)
    {
        pixelChanged.Invoke(new PixelChangedEventArgs(i, y, enabled));
    }
}

Where the following code runs actually 0.4 seconds faster

public void Go()
{
    for (int i = 0; i < bitjes.Length; i++)
    {
        BitArray curArray = bitjes[i];
        for (int y = 0; y < curArray.Length; y++)
        {
            curArray[y] = !curArray[y];
            if (pixelChanged != null)
            {
                pixelChanged.Invoke(new PixelChangedEventArgs(i, y, false));
            }
        }
    }
}

It seems that when just calling an “empty” method takes about 20% of the cpu this algorithm uses. Isn’t this strange? I’ve tried to compile the solution in debug and release mode but haven’t found any noticeable differences.

What this means is that every method call I have in this loop will slow my code down by about 0.4 seconds. Since the maze generator code currently consist of a lot of seperate method calls that excecute different actions this starts to get a substantial ammount.

I've also checked google and other posts on Stack Overflow but haven't really found a solution yet.

Is it possible to automatically optimize code like this? (Maybe with things like project Roslyn???) or should I place everything together in one big method?

Edit: I'm also interested in maybe some analysis on the JIT/CLR code differences in these 2 cases. (So where this problem actually comes from)

Edit2: All code was compiled in release mode

score 5 · Accepted Answer · answered Oct 30 '12 at 09:17

5

It is a problem, JIT has an inline optimization for methods (where the whole method code is actually injected inside the calling parent code) but this only happens for methods that are compiled to 32 bytes or less. I have no idea why the 32 byte limitation exists and would also like to see an 'inline' keyword in C# like in C/C++ exactly for these issues.

answered Oct 30 '12 at 09:17

Gilad

2,876
5
29
40

I've never heard of the inline keyword but it seems to do exactly what I need :), I hope Microsoft can add it to C# someday too. – Devedse Oct 30 '12 at 09:34
I've tried to put [MethodImpl(MethodImplOptions.AggressiveInlining)] in front of my GoDrawPixel(......) method with no avail. It is still 0.4 seconds slower then the direct implementation. – Devedse Oct 30 '12 at 10:14
I'm using that, but it's not making any difference :/ – Devedse Oct 30 '12 at 10:33
I hope someone can help me figure this one out because this seems to be the most promising answer, if it works. – Devedse Oct 30 '12 at 10:47
Did you set the compilation to "Release" ? If you ever figure this out write it in your question as this also interests me. – Gilad Oct 30 '12 at 12:58
Yes I have. I have also tried to compile my code with mono where it actually ran faster the way Marc Gravell said compared to the direct way (which makes no sense whatsoever) – Devedse Oct 30 '12 at 13:23
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/18855/discussion-between-devedse-and-gilad) – Devedse Oct 31 '12 at 11:15
I've solved this problem with the [MethodImpl(MethodImplOptions.AggressiveInlining)] thing above my method. The speed increase wasn't really noticable untill I ran the program with shift+f5 instead of f5 in release mode (this doesn't attach the debugger). In this case, the inlined code was almost 2 times as fast as the not inlined one. – Devedse Nov 29 '12 at 18:16
@Gilad do you have a reference/source for the 32 byte figure? I'd like to read more about it. – rollsch Jan 16 '18 at 00:35

Marc Gravell · Answer 2 · 2012-10-30T09:34:07.190

5

The first thing I would try would be making it static rather than instance:

public static void GoDrawPixel(PixelChangedEventHandler pixelChanged,
    int x, int y, bool enabled)
{
    if (pixelChanged != null)
    {
        pixelChanged.Invoke(new PixelChangedEventArgs(x, y, enabled));
    }
}

this changes a few things:

the stack semantics remain comparable (it loads a reference, 2 ints and a bool either way)
the callvirt becomes call - which avoids some minor overheads
the ldarg0/ldfld pair (this.pixelChanged) becomes a single ldarg0

the next thing I would look at is PixelChangedEventArgs; it might be that passing that as a struct is cheaper if it avoids a lot of allocations; or perhaps just:

pixelChanged(x, y, enabled);

(raw arguments rather than a wrapper object - requires a change in signature)

edited Oct 30 '12 at 09:34

answered Oct 30 '12 at 09:18

Marc Gravell

1,026,079
266
2,566
2,900

I've already looked into the pixelChanged(x, y, enabled) which indeed sped things up a bit. Can you maybe explain a bit what you mean by the callvirt and call differences? and the ldarg0/ldfld pair? – Devedse Oct 30 '12 at 09:29
@Devedse in the instance method, you access `this.pixelChanged` twice where-as if we pass that in a parameter if it only done once; once the reference is on stack as a local or parameter (arg0 in this case), it is **marginally** quicker to get at; besides, making it a parameter is a necessity if we make it static rather than instance. – Marc Gravell Oct 30 '12 at 09:33
I've tried your code and now it's about 0.25 seconds slower instead of 0.4. We're improving :) – Devedse Oct 30 '12 at 10:31

score 3 · Answer 3 · answered Oct 30 '12 at 09:15

3

Is this in debug or release mode? Method calls are fairly expensive, but they may be inlined when you build/run it in release mode. It will not get any optimization from the compiler in debug mode.

answered Oct 30 '12 at 09:15

René Wolferink

3,558
2
29
43

I've tried it in both release and debug. Both were about 0.4 seconds slower in the case of calling a method. – Devedse Oct 30 '12 at 09:32
@Devedse: How did you inline the methods? – Luis Filipe Oct 30 '12 at 09:39
@Luis Filipe: I manually copied the code as you can see in the main topic. – Devedse Oct 30 '12 at 09:49

score 1 · Answer 4 · answered Oct 30 '12 at 10:06

The main overhead is, as Marc said, making that virtual call and passing the arguments. Can the value of PixelChanged change during the execution of method? If not this might work (I'm not completely sure JIT optimizes the empty action delegate into a nop, you'll have to test that on your own (if it doesn't I'll just disregard good practices here and just make to calls, one with pixelChanged.Invoke called and one without (inline) and just call whatever suits best... after all sometimes you have to make the code a little less elegant in order for it to be fast).

public void Go()
{
  if (pixelChanged != null)  
     GoPixelGo((x,y,z) => { });  
  else
     GoPixelGo((i, y, enabled) => pixelChanged.Invoke(i, y, enabled));
}

public void GoPixelGo(Action<int, int, bool> action)
{
  for (int i = 0; i < bitjes.Length; i++)
  {
      BitArray curArray = bitjes[i];
      for (int y = 0; y < curArray.Length; y++)
      {
         curArray[y] = !curArray[y];
         action(i,y, false);
      }
  }
}

That took a while for me to understand, but I indeed see what you're doing now. I'll try this out too. — Devedse, Oct 30 '12 at 10:22
I guess it depends which case you need to optimize for. For the "pixelChanged is null" case, I would expect the null-test to be faster, due to the cost of delegate-invoke (fast, but not as fast as a null-test). For the "pixelChanged not null" case, then indeed removing the null-test might be helpful. It really depends which scenario is the most likely. — Marc Gravell, Oct 30 '12 at 10:39
I've tried this and it's also about 0.25 seconds slower then the direct way. — Devedse, Oct 30 '12 at 10:44

What is the overhead for a method call in a loop?

4 Answers4

Linked