Viola Jones threshold value Haar features error value

Question

I have read the viola paper from 2004. In 3.1 they explain the threshold calculation. But I am super confused. It reads as

For each feature, the examples are sorted based on feature value

Question1) Is sorted list a list of haar feature values calculated from integral image of examples. So if we have a feature and 10 images(positive and negative). we get 10 results associated with each input image.

The AdaBoost optimal threshold for that feature can then be computed in a single pass over this sorted list. For each element in the sorted list, four sums are maintained and evaluated: the total sum of positive example weights T +, the total sum of negative example weights T −, the sum of positive weights below the current example S+ and the sum of negative weights below the current example S−

Question 2) what is the purpose of sorting. I guess the one with the highest is the one describes the image best. But algorithmically how does it affect (S- S+ T+ T-).

Question3) Now for a sorted list we calculate (S- S+ T+ T-). Does this mean each entry holds its own (S- S+ T- T+) or is there only One (S- S+ T- T+) for the whole list.

The error for a threshold which splits the range between the current and previous example in the sorted list is: e = min ( S+ + (T − − S−), S− + (T + − S+)) ,

Question4) This somewhat answers my previous question but I am not sure. So in order for us have "e "for each input image. We need to maintain (S- S+ T- T+) for each entry in the list. But what do we do with "e" after we calculate N of them (one for each image) for that feature.

Thanks in advance, Please let me know if this is confusing or you need more clarification for my questions.

samgak · Accepted Answer · 2016-08-24T01:00:04.230

Question1) Is sorted list a list of haar feature values calculated from integral image of examples. So if we have a feature and 10 images(positive and negative). We get 10 results associated with each input image.

You get 10 results for the feature, one result associated with each input image. Each image is marked as positive or negative.

Question 2) what is the purpose of sorting. I guess the one with the highest is the one describes the image best. But algorithmically how does it affect (S- S+ T+ T-).

The image with the highest is the one with the highest response for that feature. You sort based on the response, not based on the weight.

The reason you sort them is that two of the things you are trying to calculate are "the sum of positive weights below the current example S+ and the sum of negative weights below the current example S−". If the list is sorted then you can keep a running sum, and at each point, all the examples whose weights you have added to the sum until then will have a feature response less than (i.e. "below") the current example. That doesn't work if the list isn't sorted. Then you can evaluate the error associated with using the response level halfway between that example and the next one as a threshold.

Question3) Now for a sorted list we calculate (S- S+ T+ T-). Does this mean each entry holds its own (S- S+ T- T+) or is there only One (S- S+ T- T+) for the whole list.

There will be one S- and one S+ per example, because it's "the sum of positive weights below the current example". T+ and T- are calculated for the whole list, I don't know why they say you need to maintain it for each element.

Question4) This somewhat answers my previous question but I am not sure. So in order for us have "e "for each input image. We need to maintain (S- S+ T- T+) for each entry in the list. But what do we do with "e" after we calculate N of them (one for each image) for that feature.

You chose the minimum out of all of them, and that's the optimal place to put the threshold (which will be the midpoint of the responses of those two examples), because it has the minimum error (false positives + false negatives). BTW, the reason there are two choices at each point (i.e. e = min ( S+ + (T − − S−), S− + (T + − S+)) ) is that you can choose whether to make the threshold so that values above that response level are considered positive (the first term), or values below it are considered positive.

If it's the former, then S+ are your false positives, (T- - S-) are your false negatives. If it's the latter, then S- are your false negatives and (T+ - S+) are your false positives.

So we either pick (S+ + (T − − S−)) or S− + (T + − S+)) through out the whole training process. And if we select the first term than that means everything above that point (that index in the sorted list) are correctly classified(positive, same thing?) this gives us (total number of correctly classified) / (All examples) When I saw the “min” i thought we calculate both (S+ + (T − − S−)) and S− + (T + − S+)) and do a min on the outputs of these. But what you are saying is “min” refers to picking the smallest result from either (S+ + (T − − S−)) or S− + (T + − S+)) through out the list. — Evren Bingøl, Aug 24 '16 at 03:19
It's the same either way isn't? You just want the overall min over all elements and both options. — samgak, Aug 24 '16 at 03:54
I don't think it is the same. first term labels all below the current example negative and all above positive where the second one is the opposite which changes the polarity, Please correct me if I am wrong. Just to recap, what you are saying is , you have to calculate both terms and get the min of them and save it as the error val with the polarity and then go through each error val we saved for each example and pick the smallest "min" error and assign it to that feature as the final error and where we get the error on the list is the threshold. — Evren Bingøl, Aug 26 '16 at 20:03
Never mind I found out that you calculate both right and left term and get the min of that and assign that error to that example and then loop the list and pick the min. I got confused with the term min weather it is to be used for picking the smallest term or picking the min of the list. I guess you get the min of the terms and then get the min error of the list. Thanks — Evren Bingøl, Aug 26 '16 at 22:32
[this explained really well.] (https://courses.cs.washington.edu/courses/cse455/16wi/notes/15_FaceDetection_ink.pptx) — Evren Bingøl, Aug 26 '16 at 22:38

Viola Jones threshold value Haar features error value

1 Answers1