6

I have been trying to tackle a problem where I need to track multiple people through multiple camera viewpoints on a real-time basis.
I found a solution DeepCC (https://github.com/daiwc/DeepCC) on DukeMTMC dataset but unfortunately, this solution has been taken down because of data confidentiality issues. They were using Fast R-CNN for object detection, triplet loss for Re-identification and DeepSort for real-time multiple object tracking.

Questions:
1. Can someone share some other resources regarding the same problem? 2. Is there a way to download and still use the DukeMTMC database for multiple tracking problem? 3. Is anyone aware when the official website (http://vision.cs.duke.edu/DukeMTMC/) will be available again?

Please feel free to provide different variations of the question :)

user1932914
  • 176
  • 3
  • 9
  • Not a full answer, but just to give you some hints. There is no way to download DukeMTMC. It contained large amounts of data split into several zip files, and the authors won't release it again. As far as I know, they even wrote an email to all previous benchmark participants that they condemn the future use of DukeMTMC. There are some alternatives that are still online, like CamNeT (but this one has wrong ground truth). Currently, my research group is working on an alternative and our the paper is under review. I can post it here if it gets accepted – TheWaveLad Jun 12 '20 at 13:55

2 Answers2

2

A good deep learning library that I have used in the past for my work is called Mask R-CNN, or Mask Regions-Convolutional Neural-Network. Although I have only used this algorithm on images and not on videos, the same principles apply, and it's very easy to make the transition to detection objects in a video. The algorithm uses Tensorflow and Keras, where you can split your input data, i.e images of people, into two sets, training, and validation.

For training, use a third party software like via, to annotate the people in the images. After the annotations have been drawn, you will export a JSON file with all annotations drawn, which will be used for the training process. Do the same thing for the validation phase, BUT make sure the images in the validation have not been seen before by the algorithm.

Once you have annotated both groups and generated JSON files, you then can start training the algorithm. Mask R-CNN makes it very easy to train, with all you need to do is pass one line full of commands to start it. If you want to train data on your GPU instead of your CPU, then install Nvidia's CUDA, which works very well with supported GPUs, and requires no coding after the installation.

During the training stage, you will be generating weights files, which are stored in the .h5 format. Depending on the number of epochs you choose, there will be a weights file generated per epoch. Once the training has finished, you then will just have to reference that weights file anytime you want to detect relevant objects, i.e. in your video feed.

Some important info:

  • Mask R-CNN is somewhat of an older algorithm, but it still works flawlessly today. Although some people have updated the algorithm to Tenserflow 2.0+, to get the best use out of it, use the following.
  • Tensorflow-gpu 1.13.2+
  • Keras 2.0.0+
  • CUDA 9.0 to 10.0

Honestly, the hardest part for me in the past was not using the algorithm, but finding the right versions of Tensorflow, Keras, and CUDA, that all play well with each other, and don't error out. Although the above-mentioned versions will work, try and see if you can upgrade or downgrade certain libraries to see if you can get better results.

enter image description here

Article about Mask R-CNN with video, I find it to be very useful and resourceful.

https://www.pyimagesearch.com/2018/11/19/mask-r-cnn-with-opencv/

The GitHub repo can be found below.

https://github.com/matterport/Mask_RCNN

EDIT

You can use this method across multiple cameras, just set up multiple video captures within a computer vision library like OpenCV. I assume this would be done with Python, which both Mask R-CNN and OpenCV are primarly based in.

Aaron Jones
  • 1,140
  • 1
  • 9
  • 26
  • 1
    Thank you for such a comprehensive response. But, the problem I am facing is that I want to calibrate multiple cameras together so, I can detect a person and assign an ID if he/she appears in either of the cameras. I am successfully able to do tracking using (yolov3 for detection & deepSort for tracking) for a single camera but, I want to extend it to multiple cameras. If you know anything in that domain then, please let me know? – user1932914 Apr 02 '20 at 10:36
2

Intel OpenVINO framewors has all part of this task:

  1. Objects detection with pretrained Faster RCNN, SSD or YOLO.

  2. Reidentification models.

And complete demo application. And you can use another models. Or if you want to use detection on GPU then take opencv_dnn_cuda for detection and OpenVINO for reidentification.

Nuzhny
  • 1,869
  • 1
  • 7
  • 13
  • Yes, I have seen it before. But it is all black box and if you want to modify something or fine-tune it for your purpose then, you cannot. Also, the Installation process is way too complicated and very unclear in my prospective of OpenVINO framework. – user1932914 Apr 08 '20 at 13:17
  • And models for OpenVINO: https://github.com/opencv/open_model_zoo – Nuzhny Apr 08 '20 at 18:07