1

I tried example provided by Qualcomm here:

https://github.com/globaledgesoft/deeplabv3-application-using-neural-processing-sdk

https://github.com/globaledgesoft/deeplabv3-application-using-neural-processing-sdk/blob/master/AndroidApplication/app/src/main/java/com/qdn/segmentation/tasks/SegmentImageTask.java

It says it should take 31ms on GPU16 for this piece of code to complete:

// [31ms on GPU16, 50ms on GPU] execute the inference

            outputs = mNeuralnetwork.execute(mInputTensorsMap);

For me the same example takes 14 seconds. I am using open-q 845 hdk development kit.

I asked my Professor and he said that the app I am installing is not trusted by the development kit firmware that is why I takes so much time to execute. He suggested me to rebuild firmware with my app installed as System app. What other reasons could be there?

Simran Marok
  • 707
  • 1
  • 5
  • 9

1 Answers1

1

yes this is very confusing, I ran to the same problem. What I noticed is that on my device at least (Snapdragon 835) ResizeBilinear_2 and ArgMax takes an insane amount of time. If you disable CPU fallback you will see that ResizeBilinear_2 is actually not supported since in the deeplab implementation they used align_corner=true.

If you pick ResizeBilinear_1 as the output layer there will be a significant improvement to the inference time with the trade off of you not having the bilinear resize layer and argmax which you will have to implement yourself.

But even then using the gpu I was only able to reach around 200 ms runtime. With the DSP I did manage to get around 100 ms.

Also be sure that your kit has opencl support in it, otherwise gpu runtime won't work afaik.

Side Note: I'm currently still testing stuff with deeplab + snpe as well. I noticed that comparing this and TFLITE gpu delegate theres some differences in the output. While SNPE in general is about twice as fast theres a lot of segmentation artifacts errors which can result in unusable model. Check this out https://developer.qualcomm.com/forum/qdn-forums/software/snapdragon-neural-processing-engine-sdk/34844

What I found out so far is that if you drop the output stride to 16 not only will you get double the inference speed, said artifacts seems to be less visible. Of course you lose some accuracy doing so. Good Luck!

Billy Batson
  • 104
  • 2
  • 11
  • Hello, I built my own custom model instead of using the one provided with example. I also signed my app with platform key. I also overclocked my dev-kit. I got 0.165ms(ArgMax), 0.225ms(ResizeBilinear_1) on DSP and for GPU/GPU16 I get 1.7 seconds (ArgMax) for interference on first four frames and then it drops to 8 second per frame. Do you know why ? – Simran Marok Sep 22 '19 at 06:09
  • This is because if you set DSP runtime and provide a non quantized model, SNPE will try to quantize the model based on the first few frames it see. To avoid this you need to use the toolkit to quantize your model before runtime by providing the tool some sample inputs so it can adjust the weights for 8 bit quantization. see https://developer.qualcomm.com/docs/snpe/model_conversion.html – Billy Batson Sep 23 '19 at 13:33
  • Hello, I checked with the platform validator. I got these results for GPU/GPU_FLOAT16: ``>>isRuntimeAvailable GPU = false >>runtimeCheck GPU = false >>libVersion GPU = null >>coreVersion GPU = null`` I got vaild results for CPU and DSP. Any ideas ? – Simran Marok Sep 23 '19 at 21:37
  • afaik snpe only works fully if the vendor also provides opencl support. Maybe check if your hardware supports it. – Billy Batson Sep 30 '19 at 14:35