How can one control the number of threads Windows ML is using for evaluation

Question

I'm trying to benchmark Windows ML against other backends and see some weird distribution of inference times (see plot). This is with the CPU backend using the ARM64 architecture. On ARM there's no bimodal distribution.

I don't have a good intuition on why there's two modes in the distribution of inference times. There doesn't seem to be a temporal correlation, I run the network once per second and it switches between the "slow" and "fast" mode seemingly randomly.

One of the guesses I have is that perhaps sometimes Windows ML decides to use two threads and sometimes one, possibly depending on estimated device load. However, unlike with TensorFlow Lite or Caffe2 I haven't found a way to control the number of threads Windows ML uses. So the question is:

Is there a way to control the number of threads Windows ML is using for evaluation in CPU mode, or is it guaranteed to use only one thread for computation in any case?

Other pointers to what could cause this weird behavior are also welcome.

Do you have a minimized repro sample? I'd love to look and see what might be holding things up. https://stackoverflow.com/help/minimal-reproducible-example — David Hollowell - MSFT, Aug 08 '19 at 03:16
Hi etarion. In Windows ML, and even in DirectML, you won't be able to set a number of threads in the thread pool for this. We can look to see what's causing the difference though. Can you share a minimized sample to reproduce the issue? — David Hollowell - MSFT, Aug 14 '19 at 15:15
I don't know if it's possible to create a minimized sample as it doesn't happen when you just run winml in a loop. By building a Desktop app and setting process affinities I've ruled out threading as the cause (you can get the "fast" time even with just one core). — etarion, Aug 16 '19 at 08:26
Hey etarion, Thanks for the updates. Unfortunately, without a way to reproduce this on my end, there's not any troubleshooting I can do to look into this further. — David Hollowell - MSFT, Aug 20 '19 at 15:04
[Rylai](https://stackoverflow.com/users/10420436) posted an [Answer](https://stackoverflow.com/a/65745355) saying "please see this Windows Machine Learning API that allows users of the API to set the number of intra-operator threads : https://learn.microsoft.com/en-us/windows/ai/windows-ml/native-apis/intraopnumthreads" — Scratte, Jan 16 '21 at 22:14

Rylai · Answer 1 · 2021-06-03T23:13:06.837

Please see this Windows Machine Learning API that allows users of the API to set the number of intra-operator threads : https://learn.microsoft.com/en-us/windows/ai/windows-ml/native-apis/intraopnumthreads

NOTE: Setting this value higher than the number of logical cores on the CPU may result in an inefficient threadpool and a slower evaluation. Here is a code snippet on how to use the API:

void SetIntraOpNumThreads(LearningModel model) {
    // Create LearningModelSessionOptions
    auto options = LearningModelSessionOptions();
    auto nativeOptions = options.as<ILearningModelSessionOptionsNative>();
 
    // Set the number of intra op threads to half of logical cores.
    uint32_t desiredThreads = std::thread::hardware_concurrency() / 2;
    nativeOptions->SetIntraOpNumThreadsOverride(desiredThreads);
 
    // Create session
    LearningModelSession session = nullptr;
    WINML_EXPECT_NO_THROW(session = LearningModelSession(model, LearningModelDeviceKind::Cpu, options));
}

A link to a solution is welcome, but please ensure your answer is useful without it: [add context around the link](//meta.stackexchange.com/a/8259) so your fellow users will have some idea what it is and why it’s there, then quote the most relevant part of the page you're linking to in case the target page is unavailable. [Answers that are little more than a link may be deleted.](/help/deleted-answers) — Sabito stands with Ukraine, Jan 16 '21 at 11:24

How can one control the number of threads Windows ML is using for evaluation

1 Answers1