1

I am using D-lib library to use ocular recognition. So I am planning to train my own classifier using the options given in the documentation. I am using Python as a language platform when compared to C++.

So, I have created the two .xml files training and testing using the imglab tool. Do I have to label all the subject names in the imglab tool? I have close to 20000 images. Is it not going to be difficult? Do we have an easy way of doing it? Please find the code matching the scenario attached.

import os
import sys
import glob

import dlib
from skimage import io


# In this example we are going to train a face detector based on the small
# faces dataset in the examples/faces directory.  This means you need to supply
# the path to this faces folder as a command line argument so we will know
# where it is.

faces_folder = "/media/praveen/SSD/NIVL_Ocular/NIR_Ocular_Training"


# Now let's do the training.  The train_simple_object_detector() function has a
# bunch of options, all of which come with reasonable default values.  The next
# few lines goes over some of these options.
options = dlib.simple_object_detector_training_options()
# Since faces are left/right symmetric we can tell the trainer to train a
# symmetric detector.  This helps it get the most value out of the training
# data.
options.add_left_right_image_flips = False
# The trainer is a kind of support vector machine and therefore has the usual
# SVM C parameter.  In general, a bigger C encourages it to fit the training
# data better but might lead to overfitting.  You must find the best C value
# empirically by checking how well the trained detector works on a test set of
# images you haven't trained on.  Don't just leave the value set at 5.  Try a
# few different C values and see what works best for your data.
options.C = 5
# Tell the code how many CPU cores your computer has for the fastest training.
options.num_threads = 4
options.be_verbose = True


training_xml_path = os.path.join(faces_folder, "/media/praveen/SSD/NIVL_Ocular/praveen_ocular_dataset.xml")
testing_xml_path = os.path.join(faces_folder, "/media/praveen/SSD/NIVL_Ocular/praveen_ocular_test_dataset.xml")
# This function does the actual training.  It will save the final detector to
# detector.svm.  The input is an XML file that lists the images in the training
# dataset and also contains the positions of the face boxes.  To create your
# own XML files you can use the imglab tool which can be found in the
# tools/imglab folder.  It is a simple graphical tool for labeling objects in
# images with boxes.  To see how to use it read the tools/imglab/README.txt
# file.  But for this example, we just use the training.xml file included with
# dlib.
dlib.train_simple_object_detector(training_xml_path, "detector.svm", options)



# Now that we have a face detector we can test it.  The first statement tests
# it on the training data.  It will print(the precision, recall, and then)
# average precision.
print("")  # Print blank line to create gap from previous output
print("Training accuracy: {}".format(
    dlib.test_simple_object_detector(training_xml_path, "detector.svm")))
# However, to get an idea if it really worked without overfitting we need to
# run it on images it wasn't trained on.  The next line does this.  Happily, we
# see that the object detector works perfectly on the testing images.
print("Testing accuracy: {}".format(
    dlib.test_simple_object_detector(testing_xml_path, "detector.svm")))




#
# # Now let's use the detector as you would in a normal application.  First we
# # will load it from disk.
# detector = dlib.simple_object_detector("detector.svm")
#
# # We can look at the HOG filter we learned.  It should look like a face.  Neat!
# win_det = dlib.image_window()
# win_det.set_image(detector)
#
# # Now let's run the detector over the images in the faces folder and display the
# # results.
# print("Showing detections on the images in the faces folder...")
# win = dlib.image_window()
# for f in glob.glob(os.path.join(faces_folder, "*.png")):
#     print("Processing file: {}".format(f))
#     img = io.imread(f)
#     dets = detector(img)
#     print("Number of faces detected: {}".format(len(dets)))
#     for k, d in enumerate(dets):
#         print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
#             k, d.left(), d.top(), d.right(), d.bottom()))
#
#     win.clear_overlay()
#     win.set_image(img)
#     win.add_overlay(dets)
#     dlib.hit_enter_to_continue()
cagatayodabasi
  • 762
  • 11
  • 34
Praveen
  • 174
  • 15

1 Answers1

0

Simply, yes, because you want to use train dlib's object detector which requires a labeled (up to a bounding box) dataset or you will use a available and labeled dataset.

And also the main function of imglab is creating bounding boxes and it's written in your comments:

To create your own XML files you can use the imglab tool which can be found in the tools/imglab folder. It is a simple graphical tool for labeling objects in images with boxes.

For the original paper please refer to: https://arxiv.org/pdf/1502.00046v1.pdf

Actually, as you said, it is really hard. The one of main challenges in object detection or recognition is creating the dataset. That's why, researchers use Mechanical Turk like sites to use the power of crowd.

cagatayodabasi
  • 762
  • 11
  • 34
  • Thanks for your advice. What did you mean by labelled dataset? The dataset that I have in place has got the images labeled based on the subjects present. I have cropped only the ocular regions in the face and they are going to be the bounding box in the picture. – Praveen Nov 04 '16 at 20:15
  • I think I get it wrong. Are you trying to discriminate each people's retina from each other? I mean you want that the detector should say "this is John's Retina and this is Brad's retina", right? – cagatayodabasi Nov 04 '16 at 21:52
  • So, this function is not a method that you can use. This function can just find the object you want and unfortunately it is a binary classifier. So, it can just say yes or no. I think you should go with scikit-learn, I also see skimage in your code, so I guess it should be very easy for you to use. Check out their catalog http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html and pick a multi output algorithm. – cagatayodabasi Nov 08 '16 at 08:44
  • Thanks for your comment. But however, D-lib has a functionality to predict the scores of every input image after comparing it with the gallery and thats what I was using. – Praveen Nov 08 '16 at 18:08
  • I don't think so. I worked it before and it can only find human faces and gives a confidence score as it says here: "Finally, if you really want to you can ask the detector to tell you the score # for each detection. The score is bigger for more confident detections." By the way, please don't get me wrong, I'm just trying to help :) – cagatayodabasi Nov 08 '16 at 20:11
  • @cata: Yes, I understand that you are trying to help and I really appreciate it. But doesnt the confidence scores tell you the similarity of the test and trained? Lets say if I am comparing person 1 with person1 and if the score is more than a threshold, say 80 I can assume that the prediction is right, correct? – Praveen Nov 08 '16 at 22:04
  • You should not do it, because it probably learns general structure of ocular region and it can a lot of false-positives. – cagatayodabasi Nov 09 '16 at 07:02
  • What other method do you suggest? – Praveen Nov 15 '16 at 19:34
  • Sorry, I am not specialized on this type of recognition. – cagatayodabasi Nov 17 '16 at 07:04