I try to use a tensorflow model trained on python in WinML. I successfully convert protobuf to onnx. The following performance result are obtained :
- WinML 43s
- OnnxRuntime 10s
- Tensorflow 12s
The inference on CPU take arround 86s.
On performance tools WinML doesn't seem to correctly use the GPU in comparison of other. It's seemed WinML use DirectML as backend (We observe DML prefix on Nvidia GPU profiler). Is it possible to use Cuda inference Engine with WinML ? Did anyone observe similar result, WinML being abnormally slow on GPU ?