23

I am getting an unexpected spike in my C# application when rendering frames. I have been going over it in a profiler and I noticed the following:

  • When vsync is on, the program seems to relinquish control to the OS (or does something) when swapping the buffers... and somehow stays very consistent in its rendering. This is good because it renders the frames smoothly and without this weird microstutter. The image below is showing what is perceived to be perfectly fluid. This is our case of 'it works perfectly by rendering smoothly without any stuttering'.

VSync image

  • When vsync is off and the program does nothing but writing as fast as it possibly can without any breaks, even though the frames per second is higher, there seems to be these large gaps where nothing is being done. This I am pretty sure is the microstutter I am noticing in game because it will cause the FPS to dramatically drop beneath 60fps. This following image shows something going wrong:

Microstutter

In the above, the delta time is 21ms (which is past the 16.6ms for 60 frames per second) and causes a noticeable stutter in game. It's even worse than this because it only starts rendering after it (as you can see by the solid rectangle after) so this 21ms is a lie, it's more like 40ms according to the graph which is horrible.

This large gap never happens when vsync is on for reasons I do not understand. For those unfamiliar with vsync and games, you can't use vsync in first person shooters because it cripples the input handling due to how it works, so I cannot use vsync and I must investigate the non-vsync version. I need to find why the non-vsync version has these major stalls though.

My question is, how do I know what is causing this extended delay from the image you see above?

The performance profiler says there's a significant amount of waiting time here, upwards of 80% waiting with 20% CPU usage (compared 100% CPU usage when preparing the rendering data).

The profiler also shows that there's no rendering code being run whatsoever in that loop... which is weird because the renderer dominates the performance almost completely, so without any capping of the frames per second it should be inundating the entire graph with a solid blue rectangle.

The problem is that the code the profiler shows is just calls into the polling input and dlls in the area I selected above:

DLL calls

Note that the other call of DispatchRenderFrame is making an OpenGL call, which when I completely remove has no effect on the program, so you can ignore that. This might mean that the user input seen below also has no effect on the microstuttering issue... but I can't remove that since it is part of the library I'm using for window management (OpenTK).

I'm not sure what the CLR Worker thread is or what it is doing. It also happens in the vsync one (desired smooth profiling one) so even though I don't know whether or not it is a culprit here, my guess is it probably is not but I'm not sure because it shows up on the 'desired vsync example' too at the same general location.

Is there some interrupt happening and the OS is taking over but not reinstating my thread because it's classified it as a CPU hog? Just a thought... but then again it shows blue bars in the example so I assume the Main thread isn't actually asleep in what I highlighted and is in fact running?

As you can see, I'm not sure on what the time slice I highlighted is telling me, and is what I need help with. I don't know why the waiting percentage is so high at this point in the code, or where to go from here to diagnose the problem further.

EDIT: I did some rough profiling with the Stopwatch class to see which locations were causing the spikes, and the spikes emanate from only the calculation part. None of the OpenGL invocations cause any lag. It is purely happening inside a function of functions that do math and access data, or write into a pre-allocated array of structs.


Extra notes:

This also has me curious if there is a way so I can force it such that the C# virtual machine gets as much CPU usage as possible?

There is no garbage being generated. This is not a GC issue. I can't afford to have the GC run during the application so I do not generate garbage. Everything in the rendering functions are all structs on the stack, the only time it goes to any heap managed object is a pooled array with more than a large enough size for all the render data. The only time it shows up in the profiler is at the beginning and end, but it does not run during the rendering phase.

Water
  • 3,245
  • 3
  • 28
  • 58
  • well, you've focused and investigate the code, but have you investigate the hardware that is running on ? re-test it again, check the CPU, Memory and GPU usage, see if there is any spikes on them. Then, from the Nvidia settings, ensure the power option is set to maximum performance, and recheck again. (don't change anything in the code yet). – iSR5 Aug 18 '19 at 06:21
  • @iSR5 I have done what you said both before and after setting the power option to maximum performance, it is the same as before and after. – Water Aug 18 '19 at 14:24
  • then I believe is something to do with rendering between the Nvidia and the Monitor as I assume that the long pause (as you called it) its just some tasks waiting for the data to be received from the processed events (OpenGL or whatever else handling it). You'll need to know at which process is delaying it. You can put some checkpoints on your code at where you suspect the issue and debug it again, to see which part of your code is causing this (which will tell you which part of OpenTK is causing it). – iSR5 Aug 18 '19 at 19:41
  • @iSR5 I will edit my answer for what I'm about to say (I think I left out this information, my bad): The pause appears to happen during the rendering calculations but before any of the actual OpenGL calls, for some reason the main thread appears to go to sleep inside the raw calculations of the geometry and does not switch back to it. I've isolated the code to where it happens, but I can't explain why the C# VM will stall during the exact same instructions with the exact same data on sequential rendering passes. This may change your response possibly. – Water Aug 18 '19 at 20:41
  • This could tell you what to do, I don't know the code, but I assume your code was depending on V-Sync optimizations. So, your OpenTK might handle the rendering differently when vsync is off. You might check if your code is optimized with aysnc and also you might divide the data into chunks (if possible) to reduce the time of calculations. – iSR5 Aug 18 '19 at 21:40
  • 1
    It's worth noting a few things: A) structs are not guaranteed to be on the stack... they favor that but the runtime may move them elsewhere. B) If you're not passing them by `ref` they are definitely getting moved to the heap. – Mgetz Aug 20 '19 at 16:04
  • 4
    @Mgetz, this is not correct. `struct`s are not moved to the heap when they are passed by-value, without `ref`/`in`. They are copied using the thread's stack. – dymanoid Aug 21 '19 at 08:28
  • 1
    @dymanoid `in the Microsoft implementation of C# on the desktop CLR, value types are stored on the stack when the value is a local variable or temporary that is not a closed-over local variable of a lambda or anonymous method, and the method body is not an iterator block, and the jitter chooses to not enregister the value.` https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/ – MineR Aug 22 '19 at 04:09
  • @MineR, this is correct. But this doesn't state, that `struct`s provided as method arguments by-value are moved to the heap, as you describe. – dymanoid Aug 22 '19 at 07:49
  • I'd have a look and see if the GC is the cause for the stutters. Easy to do, stell it to pause GC and see if the issues goes away. https://stackoverflow.com/questions/6005865/prevent-net-garbage-collection-for-short-period-of-time – Walter Verhoeven Aug 22 '19 at 14:18
  • @ComputerAidedTradingSystems As noted in the last paragraph of the post: I have profiled with netMemory and it shows so little GC that not even gen0 happens for a while. I went and did what was at that post anyways just to make 100% sure, and halting the GC did not help the situation at all. Also the GC doesn't register in dotTrace. I do object pooling to avoid generating garbage. – Water Aug 22 '19 at 17:13
  • okey, how saturated is your Threadpool, sometimes that slows things down. – Walter Verhoeven Aug 23 '19 at 15:05
  • @ComputerAidedTradingSystems It should be very underutilized, I do everything single threaded. In the future I'll be going multithreaded but right now everything is done on one thread. The only other threads are some random CLR worker that I appear to have no control over. Or is there a way it could be saturated without me knowing? I only see three threads running in dotTrace. – Water Aug 23 '19 at 16:15
  • Some processes like to use Task, this goes to the ThreadPool, you have no control over 3rd parties… You can get the Queue from the ThreadPool as shown here, I have seen over 500 threads on the pool, that's going to give that "type of stutter" you explain https://devblogs.microsoft.com/oldnewthing/20170724-00/?p=96675 – Walter Verhoeven Aug 23 '19 at 16:27
  • @ComputerAidedTradingSystems I just checked (following the advice on the link after reading it) and it says I am using no additional threads. the ThreadPool has max available threads and says none are running. – Water Aug 24 '19 at 06:55
  • The only other thing(s) I can come up with then is locking, or hardware issues. have you tried windows Perf counters to see if HW issues. You looked at the 10 stalls article from Michale's at: https://michaelscodingspot.com/7-debugging-techniques-for-when-your-net-application-freezes-hangs/ – Walter Verhoeven Aug 24 '19 at 15:07
  • What is your code doing in between rendering calls? Can you step through the functions? Are you sure it's polling and not stuck in a blocking call? – mostanes Aug 25 '19 at 21:50
  • I would still suspect some GC here (who handles those vsync buffers?). Also, if this is a `Unity` project you might run into *boxing* of values that you wouldn't expect to be boxed. 1) Is this Unity or VS? 2) Can you show some of your for loops (including type information) on your hottest paths? – l33t Aug 26 '19 at 12:20
  • I see that user32 takes most of the time here, could you pleas expand it to see what it's doing? – Grim Maple Aug 26 '19 at 15:35
  • @mostanes It is just doing simple calculations with structs and inserting things into an array – Water Aug 27 '19 at 00:59
  • @l33t There is no GC according to the profiler, how could it be the GC if not even gen0 runs? It is not unity. There's a fair amount of code that I couldn't post it unfortunately. I can only say that it's recursive and does no allocations and does not call anything outside of the function, has no threading, is extremely self contained, and has no unsafe logic anywhere. – Water Aug 27 '19 at 01:01
  • @GrimMaple I tried but it didn't show anything. Is there a way I can see what it is doing in some other way? – Water Aug 27 '19 at 01:01
  • Maybe try inserting a Stopwatch and log elapsed time very often (preferably to some large pre-defined memory buffer) and compare against the profiler data (maybe I'm reading the profiling data wrong, but to me it looks like the code just isn't running and instead waiting on something). How are you doing the polling? Do you run any library code? Is it blocking? I'm seeing there a 21ms interval selected and only 8.3ms actually running. On that note, that ProcessEvents looks very suspicious. – mostanes Aug 27 '19 at 14:47
  • It would help a lot to see some code, particularly of your render loop. Also how is your CPU usage in the task manager when Vsync is off? – Tyron Jun 27 '20 at 09:02
  • How many frames per second are you writing when vsync is off? – cineam mispelt Jun 30 '20 at 07:27
  • There is any chance to be CPU cache? Do you tried on another machine? – Leo Sep 11 '20 at 00:37

2 Answers2

1

Is your application window by any chance obscured by the Windows 10 taskbar when the spikes occur? I have seen significant stuttering linked to this phenomenon. I believe the rendering of the semi translucent taskbar implies creation of off-screen buffers which interferes with any frame-critical rendering loop.

l33t
  • 18,692
  • 16
  • 103
  • 180
-1

This is not an answer for your issue but just about some strange result in profiling

I copied it from our private GitHub issue

Just to be clear, the message from this answer is, you should take profiling result carefully, because the provided profiling result, seems to have some strange values, so I just mentioned that may be the issue is with profiling operations.


because, with multiple threads in the thread pool, the wait time will count multiple times, then we will get these strange results.

for example, for the previous test #255 (comment) where we only execute all startups in the main thread we have Main Thread and Finalizer

enter image description here

But with the current test, we have many thread pools, and the profiler accumulate all of them, although these are executed at the same time, so this is not real-time, it is a crazy thing, but this what happens

enter image description here

So if we just select the Main Thread, we will get the correct result

enter image description here

but to get the actual profiled not the real, time for SetParameterasync method, we must go to its thread and select its category

enter image description here

E_net4
  • 27,810
  • 13
  • 101
  • 139