Object Detection Performance Issues Using Tensorflow 2.1.0 and Tensorflow Hub

Question

Running through some of the object detection documentation and examples found online utilizing the OpenImagesV4 Data Model I am experiencing less than favorable performance on the processing speed of the detection events. The code I am using is as follows and is a stripped down version of the detection so I can understand the performance metrics. The Camera Stream Processes Fine without using any detection, Once detection is implemented it slows the feed down by roughly 20 seconds or so. I have seen this done in TF1.14 using the old object detection with tf.graph() functions with near zero-delay on a different model so my question is really where can more performance be made for the feed stream or where are my hang-ups at with this stripped down version. This is using the gpu for processing but only seeing spikes at ~6%. My original thought was to introduce threading on the Detection process but I am not sure how to go about doing that or if it is necessary

Software

Tensorflow version (2.1.0)
Cuda 10.1
cudnn 7

Hardware

CPU: Intel i7-4820K
GPU: Geforce GTX 1660 (6GB)
Memory: 16GB

import cv2
import time
import gc
from datetime import datetime
import tensorflow as tf
import tensorflow_hub as hub

low_res_vid_source = "http://192.168.1.85:14238/videostream.cgi?loginuse=####&loginpas=######"
hi_res_vid_source = "rtsp://####:####@192.168.1.85:10554/tcp/av0_0"
cap = cv2.VideoCapture(low_res_vid_source)

#Low Res (640): Hi Res (1280)
width = cap.get(3)

#Low Res (480): Hi Res (720)
height = cap.get(4)

print("Dimensions: Width: ", width, "Height: ", height)
#Remote Loading
#module_handle = "https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1"

#Local Loading
module_handle = "C://Users//Isaiah//tf2//Tutorial Sets//Expert//HubCache//ddd04e3eaa283f2b3ae566e084863074d12b403a"
detector = hub.load(module_handle).signatures['default']

def LoadStream():
   ret, frame = cap.read()
   image_resize_val = (1280, 720)
   frame = cv2.resize(frame, image_resize_val)

   ## Average Calculation Time of Conversion Of Pixel Normalization = 0.018950 Seconds
   frame = frame / 255

   ## Average Calculation Time of Conversion Of Image Data Type      = 0.001999 Seconds
   converted_img = tf.image.convert_image_dtype(frame, tf.float32)[tf.newaxis, ...]

   ## Average Calculation Time of Loading Results From Detector      = 1.7 Seconds
   time_start = time.time()
   results = detector(converted_img)
   time_end = time.time()
   print("Detection Took: ", time_end - time_start)
   cv2.imshow('camera feed', frame)


while True:
   LoadStream()

   if cv2.waitKey(1) & 0xFF == ord('q'):
      cv2.destroyAllWindows()
      break

Output From the Conda Environment for this code is as follows and nothing seems to be really sticking out

(tf2-gpu) C:\Users\Isaiah\tf2\Tutorial Sets\Expert\Camera_Feed>python Camera_Feed_Raw.py
2020-05-03 16:52:36.567941: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Dimensions: Width:  640.0 Height:  360.0
2020-05-03 16:54:52.037826: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-05-03 16:54:52.253465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1660 computeCapability: 7.5
coreClock: 1.815GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.86GiB/s
2020-05-03 16:54:52.260714: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-03 16:54:52.272442: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-03 16:54:52.282134: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-03 16:54:52.287729: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-03 16:54:52.300130: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-03 16:54:52.307647: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-03 16:54:52.326362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-03 16:54:52.331006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-03 16:54:52.334046: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2020-05-03 16:54:52.626783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1660 computeCapability: 7.5
coreClock: 1.815GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.86GiB/s
2020-05-03 16:54:52.633826: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-03 16:54:52.638740: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-03 16:54:52.642777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-03 16:54:52.647763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-03 16:54:52.651710: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-03 16:54:52.656789: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-03 16:54:52.660852: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-03 16:54:52.667018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-03 16:54:53.626966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-03 16:54:53.630823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-05-03 16:54:53.633295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-05-03 16:54:53.638096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4630 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:03:00.0, compute capability: 7.5)
2020-05-03 16:57:25.429470: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-03 16:57:26.697611: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-03 16:57:29.627538: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
Detection Took:  58.80091857910156
Detection Took:  1.747373104095459
Detection Took:  1.7253808975219727
Detection Took:  1.736377477645874
Detection Took:  1.7273805141448975
Detection Took:  1.7343783378601074
Detection Took:  1.742375373840332
Detection Took:  1.7413759231567383
Detection Took:  1.7293803691864014
Detection Took:  1.7283804416656494
Detection Took:  1.7403762340545654
Detection Took:  1.7323787212371826
Detection Took:  1.7373778820037842
Detection Took:  1.7323782444000244

Williams, In many cases this is expected. You can take a look at this [issue](https://stackoverflow.com/questions/58441514/why-is-tensorflow-2-much-slower-than-tensorflow-1) which discusses in depth of why Tensorflow 2.x is much slower than Tensorflow 1.x — , May 17 '20 at 03:48

Object Detection Performance Issues Using Tensorflow 2.1.0 and Tensorflow Hub

0 Answers0