Slow loading time - EfficientDet D2

Question

I am loading a Tensorflow 2 version of EfficientDet D2 (http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d2_coco17_tpu-32.tar.gz) using a Jetson AGX Xavier.

I run the following script:

#!/usr/bin/python3
import tensorflow as tf
import time
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils

PATH_TO_SAVED_MODEL = "./efficientdet_d2_coco17_tpu-32/saved_model/"

print('Loading model...')
start_time = time.time()

# Load saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)

end_time = time.time()
elapsed_time = end_time - start_time
print('Done! Took {} seconds'.format(elapsed_time))

However, the performance results is a loading time of more than 13 minutes. This is the output after the command has been executed:

./test.py
2021-07-04 10:58:58.074413: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 10:59:05.375568: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
Loading model...
2021-07-04 11:00:54.337115: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-07-04 11:00:54.342226: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-04 11:00:54.347726: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.347959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s
2021-07-04 11:00:54.348037: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 11:00:54.353788: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-04 11:00:54.354040: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-07-04 11:00:54.358471: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-04 11:00:54.359514: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-04 11:00:54.364904: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-04 11:00:54.369140: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-04 11:00:54.369861: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-04 11:00:54.370262: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.370843: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.371060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-07-04 11:00:54.375404: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.375623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s
2021-07-04 11:00:54.375714: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-07-04 11:00:54.375823: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-07-04 11:00:54.375908: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-07-04 11:00:54.376011: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-04 11:00:54.376090: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-04 11:00:54.376167: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-07-04 11:00:54.376287: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-07-04 11:00:54.376369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-04 11:00:54.376673: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.376972: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:00:54.377093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-07-04 11:05:01.847060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 11:05:01.847174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-07-04 11:05:01.847226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-07-04 11:05:01.847710: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.848589: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.848911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:908] ARM64 does not support NUMA - returning NUMA node zero
2021-07-04 11:05:01.849096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19271 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-07-04 11:05:01.850298: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Done! Took 793.8719098567963 seconds

With the computing power of the Xavier I would expect much better performance? Anybody knows what the cause to this could be?

Thanks for any help or input!

You are loading the model from disk. Are you using an SD card? Standard SD card has read bandwidth of only around 12.5Mb/s. This means it will take 80s to read 1 Gb. That said efficientdet should be quite small, so not sure why it takes that long. The compute power has nothing to do with SD card bandwidth. — bumpbump, Jul 05 '21 at 01:30

score 0 · Answer 1 · answered Jul 06 '21 at 14:06

The time it takes is not just to load the model, but to initialize the device. Maybe the problem is in the driver. To prove it, try to init a smaller model, or toy example like a+b=c. I expect it will take similar time.

Also the computing power has nothing to do with loading the model. Loading of the model depends more on memory management of the driver and TF. The actual building of the model in the memory may be done on the CPU, even when using GPU or other accelerator (just guessing).

My experience with CUDA and TF is 5 minutes initialization time with one version of CUDA, TF and GPU driver. And less than 30 sec with another version of CUDA and TF, on the same hardware (8x1080ti GPUs).

Slow loading time - EfficientDet D2

1 Answers1