CoreML / MLModelConfig preferredMetalDevice - understanding device placement heuristics

Question

Is there any public document that clearly states CoreML's strategy for GPU device placement when running inference models on macOS? How does it decide if it should run on integrated, discrete or CPU? Can one reliably 'force' one path? How does this change for systems like the new Mac Pro with multiple discrete GPUs as well as multiple eGPU?

My testing on my rMBP indicates the answer is no - and that temperature, battery, being plugged in to power, automatic graphics settings and app support and perhaps even some MLModel architecture heuristic all play a role in device placement.

Longer with context:

Im curious if there is any public documentation on CoreML's device selection heuristic. With the addition of 10.15's CoreML preferredMetalDevice API for MLModelConfig, I imagined it would be possible to force the MTLDevice an MLModel / Vision request runs on.

In my testing with integrated, discrete and eGPU on my 2018 rMBP with Vega 20, it appears only the eGPU consistently runs the CoreML model when requested.

My CoreML Model is a pipeline model consisting of a MobileNet classifier with multiple outputs (multi head classifiers attached to a custom feature extractor).

Im curious to understand device selection preference for a few reasons:

a) I'd like to ensure my MLModel is fed images CIImages backed by MTLTextures local to the device inference will be run on, to limit PCI transfers and keep processing on a single GPU device

b) My model is actually fed frames of video, and WWDC '19 / 10.15 introduces VideoToolbox and AVFoundation API's to help force particular video encoders and decoders on specific GPUs.

In theory, if all works well, I should be able to specify the same MTLDevice for video decode, preprocessing, CoreML/Vision inference, and subsequent encoding - keeping all IOSurface backed CVPixelBuffers, CVMetalTextureRefs, MPSImages and friends resident on the same GPU.

Apple has a Pro Apps WWDC video suggesting this is the path forward to fast path Multi GPU support / Afterburner decoder support moving forward.

Does CoreML ACTUALLY allow suggested device placement to work?

I am running a retina MacBook Pro 2018 with Vega 20 GPU, and trying various methods to get the Vega 20 to light up.

Disabling automatic graphics switching
Disabling automatic graphics switching / setting NSSupportsAutomaticGraphicsSwitching to False
Disabling automatic graphics switching / setting NSSupportsAutomaticGraphicsSwitching to True
Enabling automatic graphics switching / setting NSSupportsAutomaticGraphicsSwitching to False
Enabling automatic graphics switching / setting NSSupportsAutomaticGraphicsSwitching to True
having a full battery and plugged into my Apple power adaptor
having full battery and plugged into my eGPU

Results:

I can reliably get the eGPU to run inference on my MLModel if I use MLModelConfig with preferredMetalDevice - every time.
I can fairly reliably get the integrated GPU to run inference if I request it - but on occasion with some configurations of battery power, being plugged in, or automatic graphics switching options it doesn't run.
I cannot reliably get the discrete GPU to run consistently on any above combination of configurations - but do see that all of my resources are resident on the GPU (textures etc), and see that CoreML is configured to run there. It just doesn't report any activity.

I have configured my info.plist to have the proper eGPU support, and can hot plug / detect device changes and dispatch work to eGPUs, and also support detecting device removal requests. That all works. What doesn't is CoreML respecting my device placement!

I'm very interested in knowing if pytorch or tensorflow can use the M1 max gpu. — Charlie Parker, Nov 02 '21 at 15:30
If those systems leverage metal back ends, which I believe they do, it should be possible. They wont be able to leverage the ANE without opting into either private APIs (maybe with apples permission) or shitting out a CoreML Model in ML Package or ML Model format and using CoreML API's in Swift / Obj-C or coremltools prediction in python - which I believe works now for inference on M1 machines? — vade, Nov 03 '21 at 16:08
I've answered several of your many questions, but other questions do not have enough information to even guess as why things aren't functioning as you expect. — Jeshua Lacock, Jul 31 '22 at 20:46

Jeshua Lacock · Answer 1 · 2022-07-31T07:35:58.750

0

There is not a public document clearly stating CoreML’s GPU utilization plan. Note that your question seems to be asking many different questions, and should be more focused on one question per post, but I will do the best I can to answer them.

You can “force” it run on the CPU only:

let config = MLModelConfiguration()
config.computeUnits = .cpuOnly

Or CPU and GPU:

config.computeUnits = .cpuAndGPU

Or all available compute units which includes the Neural Engine if available, and if the MLModel layer(s) supports it:

config.computeUnits = .all

When there are multiple Metal devices, you can choose which to use. See this example code to choose between the highest powered Metal device, external GPUs or a GPU not driving a display.

You can also choose to allow low precision loss:

config.allowLowPrecisionAccumulationOnGPU = true

edited Jul 31 '22 at 07:35

answered Jul 31 '22 at 07:30

Jeshua Lacock

5,730
1
28
58

Because your answer isnt sufficient and sort of obvious. I know you're trying to be helpful, so I do appreciate it. None of the above is strictly true when dealing with multi GPU systems which is what I was discussing, ie Discrete GPU, Multi GPU Mac Pros, or systems with both integrated, discrete and eGPUs exhibit odd behavior with CoreML device placement heuristics and the preferred metal device is exactly that, preferred. The system appears to shunt your load around as it sees fit, and when you have an eGPU your discrete GPU appears to not be preferred most of the time. – vade Aug 02 '22 at 04:54
Additionally, when system services like media analysis run your device placement request may fail, as there appears to be limited scheduling (or resource sharing, or provisioning?) of the ANE if you are on M1. – vade Aug 02 '22 at 04:55
You should realize two things: (1) your question should have been closed to begin with, you're asking many different things and (2) you provide no source code and ask why things aren't working the way you expect. So if the quality of my answer is lacking, it is directly related to the poor quality of your question. – Jeshua Lacock Aug 02 '22 at 05:31
It is possible and supported to run on the eGPU. Why it isn't working for you would be anyone's guess based on what you've provided. – Jeshua Lacock Aug 02 '22 at 05:32
As for my answer being obvious, they were directly answering some of your questions. If they are obvious, why ask the question? – Jeshua Lacock Aug 02 '22 at 05:33
Dude, seriously. As I specifically mentioned in my post, eGPU works, but it isn't CONSISTENT NOR GURANTEED. – vade Aug 02 '22 at 15:42
Literally "In my testing with integrated, discrete and eGPU on my 2018 rMBP with Vega 20, it appears only the eGPU consistently runs the CoreML model when requested." Discrete GPU doesn't always work when preferred device placement is requested. It will occasionally work depending on status (bat / temp and running apps) but if you have an eGPU it works less reliably eGPU if provided tends to work but appears to change based on system status and running apps. Integrated tends to work but appears to change availability based on system status (bat / temp and running apps) – vade Aug 02 '22 at 15:45
Dude, I said it isn't working for you as you expect. You aren't providing any information that makes figuring out why it doesn't work consistently for you, and I don't appreciate the attitude so I'm done trying to help you. – Jeshua Lacock Aug 02 '22 at 19:34
I am legitimately confused as to how you think you are helping. All you've done is spell out how to use the API, which I document in my question I am using, understand,, and works under *some circumstances*. All you've said is 'CoreML is a black box' which - yes, welcome to developing on Apple APIs. I do appreciate the time, but please dont be patronizing and please read the entirety of the scope and context of the question. – vade Aug 02 '22 at 20:14
You asked does a relevant public document exist. The answer is NO. You asked how does CoreML decide what hardware to utilize, I answered with code showing all the available options. You asked how to choose what metal device to run on, I provided a link to sample code. You're crying for help, but again, you've not even tried to post any sample code showing what you are doing so that someone might be able to offer help for yet ANOTHER question. – Jeshua Lacock Aug 02 '22 at 20:45
In case you're not aware, nearly 100% of quality SO questions provide source code so that people can see what is going on and possibly offer suggestions. No source code, no way to help with your problem dude. – Jeshua Lacock Aug 02 '22 at 20:47

CoreML / MLModelConfig preferredMetalDevice - understanding device placement heuristics

1 Answers1