What is the format for the training/testing data for a Computer Vision model

Question

I am trying to build a CV model for detecting objects in videos. I have about 6 videos that have the content I need to train my model. These are things like lanes, other vehicles, etc. that I’m trying to detect.

I’m curious about the format of the dataset I need to train my model with. I can have each frame of each video turn into images and create a large repository of images to train with or I can use the videos directly. Which way do you think is better?

I apologize if this isn't directly a programming question. I'm trying to assemble my data and I couldn't make up my mind about this.

if you plan to use `tensorflow` then you can use the `tf.data.Dataset` to preprocess and store these videos as`TFRecordDataset` structures. It is probably bit more work but is easier to operate on in `tf`. [This](https://stackoverflow.com/questions/48101576/tensorflow-read-video-frames-from-tfrecords-file) is one answer on SO for preprocessing your videos. — Siddhant Tandon, Jan 15 '20 at 00:00

score 1 · Accepted Answer · answered Jan 15 '20 at 05:56

Yolo version 3 is a good starting point. The trained model will have a .weight file and a .cfg file which can be used to detect object from webcam, video in computer or, in Android with opencv.

In opencv python, cv.dnn.readNetFromDarknet("yolov3_tiny.cfg", "CarDetector.weights") can be used load the trained model.

In android similar code,

String tinyYoloCfg = getPath("yolov3_tiny.cfg", this);
String tinyYoloWeights = getPath("CarDetector.weights", this);
Net tinyYolo = Dnn.readNetFromDarknet(tinyYoloCfg, tinyYoloWeights);

Function reference can be found here, https://docs.opencv.org/4.2.0/d6/d0f/group__dnn.html

Your video frames need to be annotated with a tool that generates bounding boxes in yolo format and there are quite a few available. In order to train custom model this repository contains all necessary information, https://github.com/AlexeyAB/darknet

What is the format for the training/testing data for a Computer Vision model

1 Answers1