I have a quantized tflite model that I'd like to benchmark for inference on a Nvidia Jetson Nano. I use tf.lite.Interpreter() method for inference. The process doesn't seem to run on the GPU as the inference times on both CPU and GPU are the same.
Is there any way to run a tflite model on GPU using Python?
I tried to force GPU usage by setting tf.device() method but still doesn't work. The official documentation has something called delegates for GPU acceleration but I can't seem to find anything for Python.
with tf.device('/device:GPU:0'):
interpreter = tf.lite.Interpreter(model_path="model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.uint8)
interpreter.set_tensor(input_details[0]['index'], input_data)
start_time = time.time()
interpreter.invoke()
elapsed_time = time.time() - start_time
print(elapsed_time)
output_data = interpreter.get_tensor(output_details[0]['index'])