5

I'm developing a Metal-based app, and in some cases properly compiled and linked shader code will cause the application to simply crash without throwing any errors.

A "crash" consists of a halt in visual output (in some cases preceded by a short stutter of a couple alternating frames), but otherwise normal procession of the rest of the application. The Xcode performance monitoring utilities report 60fps but 0ms GPU latency, and CPU-side execution continues, with calls to the Metal API still completing successfully.

No errors are reported to the console.

This is extremely difficult to debug, as I have no indication of where in shader code the error is coming from. It would help if I knew under what conditions this is actually supposed to happen, so that I can have a good list of things to check. Otherwise I'm just shooting in the dark whenever this comes up.

warrenm
  • 31,094
  • 6
  • 92
  • 116
lcmylin
  • 2,552
  • 2
  • 19
  • 31
  • I am working with compute kernels and I also have frequent crashes. No help from xcode in anyway. I comment out code until it works and then add pieces back. Takes enormous amounts of time. A metal playground would be awesome. To quickly test small bits of code. – R Menke Aug 22 '15 at 14:50
  • After working with it for a little while more, it seems mainly to be a side effect of iOS' fault-recovery systems (I don't get a lot of these issues on OS X). I've narrowed down most crashes to either a shader performing too slowly (iOS seems to automatically crash apps when FPS goes below 1, to prevent an app from crashing the whole device) or when I access an invalid memory area (iOS apps are, after all, sandboxed). Now it would be nice if these systems would actually communicate to the Metal front-end that the driver had been crashed so that calls to the API would report an actual error. – lcmylin Aug 23 '15 at 15:08
  • I have a test project set up for c++ to just debug the code. Syntax highlighting and debugging goes a lot better that way. Big shaders with lots of loops take to long i think. and then it crashes, same like yours – R Menke Aug 24 '15 at 19:23
  • I had a shader with three nested for loops that compiled fine but then refused to link at runtime, let alone run on the device. There seems to be some limit to shader function size/number of nested branches, but I can't find any specifics on what those limits are (and Xcode doesn't help at all here, obviously). It's just trial and error. – lcmylin Aug 25 '15 at 20:10
  • [my question about something similar](http://stackoverflow.com/questions/32193726/newcomputepipelinestatewithfunction-failed) I also noticed that nested loop problem. Already eliminated almost all if statements. Moved them back to the cpu. I am now dividing my shader up into three parts. This will add enormous amounts of overhead and it will limit how much can be done in parallel, but maybe it will help. – R Menke Aug 25 '15 at 21:46
  • 1
    The current Metal implementation on iOS was clearly designed for game graphics and interactive apps, not for hardcore compute. On OS X, on the other hand, it will happily consume all GPU resources for an indefinite amount of time and lock up the system (at least Windows puts a limit of 3 seconds for an individual graphics command before resetting the GPU). – lcmylin Aug 25 '15 at 22:10

2 Answers2

3

The GPU can crash when you read or write off the end of a MTLBuffer, write off the end of a MTLTexture, or simply run too long. There is a watchdog timer that will reset the GPU if it doesn't complete its work in less than a few seconds. Work on the GPU is not preemptively scheduled. It is possible for long running work to make the device seem locked up by preventing basic GUI tasks from executing. If you have long running workloads, it is necessary to split it up into many smaller kernels. To keep the interface responsive you should keep workloads < 100 ms. To avoid video stuttering, a consistent frame rate is recommended.

Ian Ollmann
  • 1,592
  • 9
  • 16
  • This makes it very difficult to use Metal for any heavy/variable compute load on iOS. Is there really no preemption option, so heavy compute can continue while not blocking GUI tasks? – Hashman Jan 17 '18 at 20:00
1

I was having frequent crashes due to heavy Metal shaders as well and manged to fix it by throttling the dispatch rate. You can do this easily by measuring the runtime of the last "frame", and inserting a wait before every dispatch by a ratio of that amount:

[NSthread sleepFortimeInterval: _lastRunTime*RATIO];
NSDate *startTime = [NSDate date];
... [use Metal shaders] ...
_lastRunTime = -[startTime timeIntervalSinceNow];

I set the RATIO to 1.0. So it never uses more than 50% of gpu. It obviously impacts frame rate, but beats random crashes. You can play with the ratio. Nice thing is you don't have to worry about throttling too much or too little on different products, as its a ratio of runtime.

Hashman
  • 367
  • 1
  • 10