Length of each row of training matrix for SVM

Question

I have found an excellent/comprehensive post/answer related to this topic here using OpenCV and SVM with images. However, I have some questions I would like to clarify out from the answer in the link. (since i do not have enough reputation to write a comment).

What I've been doing: I am using OpenCV SVM for training. The features used for the training matrix are obtained by computing the normalized mean R,G and B values for each image.Thus, in the training matrix, there are 4 columns in each row(or in each image). These columns correspond to the labels(1 or 0), ****normalized mean in r channel****, g and b channel.

By the way, my original training file is a text file, which I will still convert to float[][], and evenutally into Mat object to feed into opencv's SVM. here's how the file looks like:

1 0.267053 0.321014 0.411933
1 0.262904 0.314294 0.422802
.
.
0 0.29101 0.337208 0.371782
0 0.261792 0.314494 0.423714

Apparently, this contradicts to the statement from the link which states that the size of each row must be equivalent to the size of the image. Is it a protocol or some kind of a rule? I just cannot make sense as to why it should(if it is).

My question is, in constructing the training matrix, does the length of the each row have to correspond to the area or size of the image? Where as in the training matrix I've made, the length of each row is only 4. Is this wrong?

In addition, does having only 3 features(3 columns) for training enough for classification/for SVM? Please guide me to the right path, I'm doubting if I should continue with this or if there's some other better approach to the problem.

I hope I'll get to understand more of the concepts behind the steps of SVM. Articles or related samples would be appreciated!

score 2 · Answer 1 · edited May 23 '17 at 12:10

Size of each row does not have to be equivalent of image size. It depends on what you have for features. Using mean values for image classification is not enough. Just think about how you classify objects when you look at a picture. You don't calculate mean values but you probably look at contours, connected areas, sometimes individual pixel values in the processing background of the brain.

So to get more features, I have a suggestion for you. Calculate each column's mean value for feature extraction part. This will be more useful probably.

And for another feature extraction you can use PCA. Normally you can give all pixel values in a row for training SVM but even for 200*200 image this makes 40.000 features, WOW, that is so much. You need to reduce this feature dimension without losing much information, this means retaining an acceptable percentage of variance. So PCA is used for this, reducing the feature space dimension and retaining the variance at an acceptable rate.

I will try to show you how you can reduce feature space with PCA. First you will need to acquire images, than roll images to a Mat variable row by row :

Reading csv:

void read_csv(const string& filename, vector& images, vector& labels, char separator = ';') 
{
    std::ifstream file(filename.c_str(), ifstream::in);
    if (!file) 
    {
        string error_message = "No valid input file was given, please check the given filename.";
        CV_Error(1, error_message);
    }
    string line, path, classlabel;
    while (getline(file, line)) 
    {
        stringstream liness(line);

        getline(liness, path, separator);
        getline(liness, classlabel);

        if(!path.empty() && !classlabel.empty()) 
        {
            Mat im = imread(path, 0);

            images.push_back(im);
            labels.push_back(atoi(classlabel.c_str()));
        }
    }
}

Rolling images row by row :

Mat rollVectortoMat(const vector<Mat> &data) // data is vector of Mat images
{
   Mat dst(static_cast<int>(data.size()), data[0].rows*data[0].cols, CV_32FC1);
   for(unsigned int i = 0; i < data.size(); i++)
   {
      Mat image_row = data[i].clone().reshape(1,1);
      Mat row_i = dst.row(i);                                       
      image_row.convertTo(row_i,CV_32FC1, 1/255.);
   }
   return dst;
}

MAIN

int main()
{

    PCA pca;

    vector<Mat> images_train;
    vector<Mat> images_test;
    vector<int> labels_train;
    vector<int> labels_test;

    read_csv("train1k.txt",images_train,labels_train);
    read_csv("test1k.txt",images_test,labels_test);

    Mat rawTrainData = rollVectortoMat(images_train);                       
    Mat rawTestData  = rollVectortoMat(images_test);                

    Mat trainLabels = getLabels(labels_train);
    Mat testLabels  = getLabels(labels_test);

    int pca_size = 500;

    Mat trainData(rawTrainData.rows, pca_size,rawTrainData.type());
    Mat testData(rawTestData.rows,pca_size,rawTestData.type());


    pca(rawTrainData,Mat(),CV_PCA_DATA_AS_ROW,pca_size);

    for(int i = 0; i < rawTrainData.rows ; i++)
        pca.project(rawTrainData.row(i),trainData.row(i));

    for(int i = 0; i < rawTestData.rows ; i++)
        pca.project(rawTestData.row(i),testData.row(i));

}

To summarize, you read a csv file which is like image_path;label . Than you roll images to a Mat variable row by row. You apply pca to reduce to 500 feature. I applied these PCA redcution to reduce 200*200 images (40000 features) to 500 feature size. Than I applied MLP to classify this. This testData and trainData variables can be used with SVM too. You can also check how to train it with MLP in my SO post :

OpenCV Neural Network Sigmoid Output

Thank you for the suggestion. Will definitely try this. But may I ask what do you mean by this line "each column's mean value for feature extraction part.". Does the column you mean refers to the column of the matrix representation of each image? — user3339658, Mar 08 '14 at 11:37
Let's say you have 100*100 RGB image. If you take each channel's column wise mean values this makes 100*3 = 300 features. But it was just a suggestion, I don't expect it to represent something meaningful about the object, but it should be way better than using a 4 feature :) — yutasrobot, Mar 09 '14 at 07:03
Yeah if you represent it with matrix, you will have 3 100*100 Mat array by using OpenCV's split method. — yutasrobot, Mar 09 '14 at 07:04

score 1 · Answer 2 · answered Mar 07 '14 at 07:57

1

IF each pixel of the image is a feature you want to train the SVM with, then each row should list all features and therefore all pixels. In your case, it appears that you only have 3 features per image (mean R,G,B) so there shouldn't be any problem.

Of course, you can perfectly well train a SVM with 3 dimensions. But ignoring SVM, is average color even a sensible metric for your images?

answered Mar 07 '14 at 07:57

MSalters

173,980
10
155
350

Honestly, I am not confident if these 3 features would be enough to produce accurate classification results. I actually just based the use of the 3 features from a research paper which used it and indicated a high accuracy result(in the classification of leaf diseases to be specific.). Thank you for raising that point, @MSalters – user3339658 Mar 08 '14 at 11:27
do you have any suggestions on which features should I be including aside from the mean color? I am planning to add more features but I don't know as to how many features would suffice. Do you have any idea? I am planning to include size, and texture but I'm still looking for ways on how to compute for it manually. Can you suggest means on how to obtain these features? – user3339658 Mar 08 '14 at 11:29
@user3339658: It's a different matter if you've already established a Region of Interest, of course. The color of the leaf is useful, the average color of the leaf and its surroundings less so. – MSalters Mar 08 '14 at 21:43
Yes, this is how the ROI looks like. https://drive.google.com/file/d/0B1aXcXzD_OADTWFCbGlEYWZVcTg/edit?usp=sharing. Don't you think the average color of the ROI here won't be useful? – user3339658 Mar 09 '14 at 00:36

Length of each row of training matrix for SVM

2 Answers2