5

Task: Classify images of human faces as female or male. Training images with labels are available, obtain the test image from webcam.

Using: Python 2.7, OpenCV 2.4.4

I am using ORB to extract features from a grayscale image which I hope to use for training a K-Nearest Neighbor classifier. Each training image is of a different person so the number of keypoints and descriptors for each image are obviously different. My problem is that I'm not able to understand the OpenCV docs for KNN and ORB. I've seen other SO questions about ORB, KNN and FLANN but they didn't help much.

What exactly is the nature of the descriptor given by ORB? How is it different than descriptors obtained by BRIEF, SURF, SIFT, etc.?

It seems that the feature descriptors should be of the same size for each training sample in KNN. How do I make sure that the descriptors are of the same size for each image? More generally, in what format should features be presented to KNN for training with given data and labels? Should the data be an int or float? Can it be char?

The training data can be found here.

I am also using the haarcascade_frontalface_alt.xml from opencv samples

Right now the KNN model is given just 10 images for training to see if my program passes without errors which, it does not.

Here is my code:

import cv2
from numpy import float32 as np.float32

def chooseCascade():
    # TODO: Option for diferent cascades
    # HAAR Classifier for frontal face
    _cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
    return _cascade

def cropToObj(cascade,imageFile):
    # Load as 1-channel grayscale image
    image = cv2.imread(imageFile,0)

    # Crop to the object of interest in the image
    objRegion = cascade.detectMultiScale(image) # TODO: What if multiple ojbects in image?

    x1 = objRegion[0,0]
    y1 = objRegion[0,1]
    x1PlusWidth = objRegion[0,0]+objRegion[0,2]
    y1PlusHeight = objRegion[0,1]+objRegion[0,3]

    _objImage = image[y1:y1PlusHeight,x1:x1PlusWidth]

    return _objImage

def recognizer(fileNames):
    # ORB contructor
    orb = cv2.ORB(nfeatures=100)

    keyPoints = []
    descriptors = [] 

    # A cascade for face detection
    haarFaceCascade = chooseCascade()

    # Start processing images
    for imageFile in fileNames:
        # Find faces using the HAAR cascade
        faceImage = cropToObj(haarFaceCascade,imageFile)

        # Extract keypoints and description 
        faceKeyPoints, faceDescriptors = orb.detectAndCompute(faceImage, mask = None)

        #print faceDescriptors.shape
        descRow = faceDescriptors.shape[0]
        descCol = faceDescriptors.shape[1]

        flatFaceDescriptors = faceDescriptors.reshape(descRow*descCol).astype(np.float32)

        keyPoints.append(faceKeyPoints)
        descriptors.append(flatFaceDescriptors)

    print descriptors

    # KNN model and training on descriptors
    responses = []
    for name in fileNames:
        if name.startswith('BF'):
            responses.append(0) # Female
        else:
            responses.append(1) # Male

    knn = cv2.KNearest()
    knnTrainSuccess = knn.train(descriptors,
                                responses,
                                isRegression = False) # isRegression = false, implies classification

    # Obtain test face image from cam
    capture = cv2.VideoCapture(0)
    closeCamera = -1
    while(closeCamera < 0):
        _retval, _camImage = capture.retrieve()      

        # Find face in camera image
        testFaceImage = haarFaceCascade.detectMultiScale(_camImage) # TODO: What if multiple faces?

        # Keyponts and descriptors of test face image
        testFaceKP, testFaceDesc = orb.detectAndCompute(testFaceImage, mask = None)
        testDescRow = testFaceDesc.shape[0]
        flatTestFaceDesc = testFaceDesc.reshape(1,testDescRow*testDescCol).astype(np.float32) 

        # Args in knn.find_nearest: testData, neighborhood
        returnedValue, result, neighborResponse, distance = knn.find_nearest(flatTestFaceDesc,3) 

        print returnedValue, result, neighborResponse, distance


        # Display results
        # TODO: Overlay classification text
        cv2.imshow("testImage", _camImage)

        closeCamera = cv2.waitKey(1)
    cv2.destroyAllWindows()


if __name__ == '__main__':
    fileNames = ['BF09NES_gray.jpg', 
                 'BF11NES_gray.jpg', 
                 'BF13NES_gray.jpg', 
                 'BF14NES_gray.jpg', 
                 'BF18NES_gray.jpg', 
                 'BM25NES_gray.jpg', 
                 'BM26NES_gray.jpg', 
                 'BM29NES_gray.jpg', 
                 'BM31NES_gray.jpg', 
                 'BM34NES_gray.jpg']

    recognizer(fileNames)

Currently I am getting an error at the line with knn.train() where descriptors is not detected as a numpy array.

Also, is this approach completely wrong? Am I supposed to use some other way for gender classification? I wasn't satisfied with the fisherface and eigenface example in the opencv facerec demo so please don't direct me to those.

Any other help is much appreciated. Thanks.

--- EDIT ---

I've tried a few things and come up with an answer.

I am still hoping that someone in SO community can help me by suggesting an idea so that I don't have to hardcode things into my solution. I also suspect that knn.match_nearest() isn't doing what I need it to do.

And as expected, the recognizer is not at all accurate and very prone to giving misclassification due to rotation, lighting, etc. Any suggestions on improving this approach would be really appreciated.

The database I am using for training is: Karolinska Directed Emotional Faces

samkhan13
  • 3,315
  • 2
  • 33
  • 54
  • a quick comment. just found out about BOW. seems relevant. SO answer(http://stackoverflow.com/questions/15611872/bow-in-opencv-using-precomputed-features). and here(https://groups.google.com/forum/#!topic/accord-net/u5viBhgv0Fw)` it says The Bag of Visual Words serves one purpose and one purpose only: to translate variable length feature representations into fixed-length feature representations. ` – Zaw Lin Oct 17 '13 at 20:53

2 Answers2

1

i have some doubts on the effectiveness/workability of the described approach. here's a another approach that you might want to consider. the contents of gen folder is @ http://www1.datafilehost.com/d/0f263abc. as you will note when the data size gets bigger(~10k training samples), the size of the model may become unacceptable(~100-200mb). then you will need to look into pca/lda etc.

import cv2
import numpy as np
import os

def feaCnt():
    mat = np.zeros((400,400,3),dtype=np.uint8)
    ret = extr(mat)
    return len(ret)

def extr(img):
    return sobel(img)

def sobel(img):
    gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    klr = [[-1,0,1],[-2,0,2],[-1,0,1]]
    kbt = [[1,2,1],[0,0,0],[-1,-2,-1]]
    ktb = [[-1,-2,-1],[0,0,0],[1,2,1]]
    krl = [[1,0,-1],[2,0,-2],[1,0,-1]]
    kd1 = [[0,1,2],[-1,0,1],[-2,-1,0]]
    kd2 = [[-2,-1,0],[-1,0,1],[0,1,2]]    
    kd3 = [[0,-1,-2],[1,0,-1],[2,1,0]]
    kd4 = [[2,1,0],[1,0,-1],[0,-1,-2]]
    karr = np.asanyarray([
        klr,
        kbt,
        ktb,
        krl,
        kd1,
        kd2,
        kd3,
        kd4
        ])
    gray=cv2.resize(gray,(40,40))
    res =  np.float32([cv2.resize(cv2.filter2D(gray, -1,k),(15,15)) for k in karr])
    return res.flatten()


root = 'C:/data/gen'

model='c:/data/models/svm/gen.xml'
imgs = []
idx =0
for path, subdirs, files in os.walk(root):
  for name in files:  
    p =path[len(root):].split('\\')
    p.remove('')
    lbl = p[0]
    fpath = os.path.join(path, name)
    imgs.append((fpath,int(lbl)))
    idx+=1

samples = np.zeros((len(imgs),feaCnt()),dtype = np.float32)
labels = np.zeros(len(imgs),dtype = np.float32)

i=0.
for f,l in imgs:
  print i
  img = cv2.imread(f)
  samples[i]=extr(img)
  labels[i]=l
  i+=1

svm = cv2.SVM()
svmparams = dict( kernel_type = cv2.SVM_POLY, 
                       svm_type = cv2.SVM_C_SVC,
                       degree=3.43,
                       gamma=1.5e-4,
                       coef0=1e-1,
                       )
print 'svm train'
svm.train(samples,labels,params=svmparams)
svm.save(model)
print 'done'

result = np.float32( [(svm.predict(s)) for s in samples])
correct=0.
total=0.

for i,j in zip(result,labels):
    total+=1
    if i==j:
      correct+=1
    print '%f'%(correct/total)
Zaw Lin
  • 5,629
  • 1
  • 23
  • 41
  • thank you very much for your solution. I think that you are correct in noting that a model using ORB and KNN would not be practical with larger training data. Can you tell me about the parameters that you used for the sobel filter and the SVM? Did you obtain them from a journal paper or your own research? Also, can you tell me why do you resize the image to (40,40)? – samkhan13 Oct 08 '13 at 06:03
  • i wasnt referring to orb/knn when commenting about data size. i was referring to the posted method. the svm parameters were from train_auto method with 10 fold cross validation set on ~12k total size. the sobel ones are perhaps not very scientific but intuitively i see it as extracting 8 equal directional directives(0,45,90 degree etc)(which is not exactly what it's doing but close enough) using a filter bank of 8. that's just a dumb way to get phase info out of the image. you can replace sobel with other things. that's just to get you started. resize of 40x40 is to reduce processing time – Zaw Lin Oct 08 '13 at 06:28
  • there's also another resize going on at (15,15). that's because you need fixed length vector for svm and also helps reduce final model size. the total size of each vector is 15x15x8(1800). that's pretty big! so when training data size increases, the support vectors stored(each 1800 length!) will blow up the stored model size. that's why you may need to do pca/lda before svm. if you need data, search for 'morph'. this task is pretty sensitive to racial traits, so you may want to ensure your training data only contains subjects of one racial type(western,african,asian etc) if possible – Zaw Lin Oct 08 '13 at 06:41
1

Previously, I was struggling to find the technical difference between ORB, SIFT, SURF etc. and I found these SO posts helpful:

The most important thing to note is that these feature detection algorithms in opencv require a single channel (typically 8 bit) grayscale image.

It turns out that knn.train() can only accept 'array' with data type as '32 bit floating-point'. I believe SVM training in opencv also has this requirement. In python, numpy array need to have the same type of data in each row and all the rows need to be of the same shape unlike python lists which can have data of any type and size.

So after growing a list of the descriptors I converted the list to an array.

But! Before that, I hard coded the ORB nfeatures parameter to 25. All my training data images are of roughly the same resolution and I was able to manually verify that each image could produce at least 25 keypoints using ORB. Each keypoint has 32 descriptors so 25*32 gives 800 descriptors for each face image. ORB returns an array whose elements are integer type, with number of rows equal to number of keypoints. I reshaped this into a single row of descriptors to produce a 'vector' of size 800.

The next challenge was in using knn.find_nearest(). It requires a 'matrix' whose rows are identical in shape to the rows of the ndarray given to knn.train(). Not doing so can produce an error:

OpenCV Error: Bad argument (Input samples must be floating-point matrix (<num_samples>x<var_count>)) in find_nearest

Even if you have a single vector that needs to be passed to knn.find_nearest() it needs to be in the shape 1xm where m is the number of elements in the vector.

So I had to hack up a crude way to check that the image taken by my webcam was usable within my hardcoded approach to the problem.

The code looks like this now:

import cv2
import numpy as np

def chooseCascade():
    # TODO: Option for diferent cascades
    # HAAR Classifier for frontal face
    _cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
    return _cascade

def cropToObj(cascade,imageFile,flag):
    if flag == 0:
        # Load as 1-channel grayscale image
        image = cv2.imread(imageFile,0)
    elif flag == 1:
        # Load as 3-channel color image
        image = cv2.imread(imageFile,1)
    elif flag == -1: 
        # Load image as is 
        image = cv2.imread(imageFile,-1)
    elif flag == 2:
        # Image is from camera
        image = imageFile
    else:
        print 'improper arguments passed to cropToObj'

    # Crop to the object of interest in the image
    objRegion = cascade.detectMultiScale(image) # TODO: What if multiple ojbects in image?

    x1 = objRegion[0,0]
    y1 = objRegion[0,1]
    x1PlusWidth = objRegion[0,0]+objRegion[0,2]
    y1PlusHeight = objRegion[0,1]+objRegion[0,3]

    objImage = image[y1:y1PlusHeight,x1:x1PlusWidth]

    return objImage

def recognizer(fileNames):
    # ORB contructor
    orb = cv2.ORB(nfeatures=25)

    keyPoints = []
    descriptors = [] 

    # A cascade for face detection
    haarFaceCascade = chooseCascade()

    # Start processing images
    for imageFile in fileNames:
        # Find faces using the HAAR cascade
        faceImage = cropToObj(haarFaceCascade,imageFile,flag)

        # Extract keypoints and description 
        faceKeyPoints, faceDescriptors = orb.detectAndCompute(faceImage, mask = None)

        #print faceDescriptors.shape
        descRow = faceDescriptors.shape[0]
        descCol = faceDescriptors.shape[1]

        flatFaceDescriptors = faceDescriptors.reshape(descRow*descCol)

        keyPoints.append(faceKeyPoints)
        descriptors.append(flatFaceDescriptors)

    descriptors = np.asarray(descriptors, dtype=np.float32)

    # KNN model and training on descriptors
    responses = []
    for name in fileNames:
        if name.startswith('BF'):
            responses.append(0) # Female
        else:
            responses.append(1) # Male

    responses = np.asarray(responses)

    knn = cv2.KNearest()
    knnTrainSuccess = knn.train(descriptors,
                                responses,
                                isRegression = False) # isRegression = false, implies classification

    # Obtain test face image from cam
    capture = cv2.VideoCapture(0)
    closeCamera = -1
    while(closeCamera < 0):
        retval, camImage = capture.read()      

        # Find face in camera image
        try:
            testFaceImage = cropToObj(haarFaceCascade, camImage, 2) # TODO: What if multiple faces?
            testFaceImage = cv2.cvtColor(testFaceImage, cv2.COLOR_BGR2GRAY)
        except TypeError:
            print 'check if front face is visible to camera'
            pass

        # Keyponts and descriptors of test face image
        testFaceKP, testFaceDesc = orb.detectAndCompute(testFaceImage, mask = None)
        testDescRow = testFaceDesc.shape[0]
        testDescCol = testFaceDesc.shape[1]
        flatTestFaceDesc = testFaceDesc.reshape(1,testDescRow*testDescCol)
        flatTestFaceDesc = np.asarray(flatTestFaceDesc,dtype=np.float32) 

        if flatTestFaceDesc.size == 800:
            # Args in knn.find_nearest: testData, neighborhood
            returnedValue, result, neighborResponse, distance = knn.find_nearest(flatTestFaceDesc,5)
            if returnedValue == 0.0:
                print 'Female'
            else:
                print 'Male'
        else: 
            print 'insufficient size of image' 

        # Display results
        # TODO: Overlay classification text
        cv2.imshow("testImage", camImage)

        closeCamera = cv2.waitKey(1)
    cv2.destroyAllWindows()


if __name__ == '__main__':
    fileNames = ['BF09NES_gray.jpg', 
                 'BF11NES_gray.jpg', 
                 'BF13NES_gray.jpg', 
                 'BF14NES_gray.jpg', 
                 'BF18NES_gray.jpg', 
                 'BM25NES_gray.jpg', 
                 'BM26NES_gray.jpg', 
                 'BM29NES_gray.jpg', 
                 'BM31NES_gray.jpg', 
                 'BM34NES_gray.jpg']

    recognizer(fileNames)

I am still hoping that someone in SO community can help me by suggesting an idea so that I don't have to hardcode things into my solution. I also suspect that knn.match_nearest() isn't doing what I need it to do.

And as expected, the recognizer is not at all accurate and very prone to giving misclassification due to rotation, lighting, etc. Any suggestions on improving this approach would be really appreciated.

Community
  • 1
  • 1
samkhan13
  • 3,315
  • 2
  • 33
  • 54
  • a quick comment. just found out about BOW. seems relevant. SO answer(http://stackoverflow.com/questions/15611872/bow-in-opencv-using-precomputed-features). and here(https://groups.google.com/forum/#!topic/accord-net/u5viBhgv0Fw) it says `The Bag of Visual Words serves one purpose and one purpose only: to translate variable length feature representations into fixed-length feature representations. ` – Zaw Lin Oct 11 '13 at 11:26
  • @ZawLin thanks for the comment. if you paste it as an answer i can accept it. – samkhan13 Oct 17 '13 at 20:01