What is the mAP metric and how is it calculated?

Question

In Computer Vision and Object Detection, a common evaluation method is mAP. What is it and how is it calculated?

Ankitp · Answer 1 · 2016-05-28T11:36:18.783

54

mAP is Mean Average Precision.

Its use is different in the field of Information Retrieval (Reference [1] [2] )and Multi-Class classification (Object Detection) settings.

To calculate it for Object Detection, you calculate the average precision for each class in your data based on your model predictions. Average precision is related to the area under the precision-recall curve for a class. Then Taking the mean of these average individual-class-precision gives you the Mean Average Precision.

To calculate Average Precision, see [3]

edited May 28 '16 at 11:36

answered May 28 '16 at 11:30

Ankitp

747
6
12

2

what should be the minimum mAp score for object detection model – Hamza Jun 03 '21 at 11:24
Is there actually a original paper existing, where mAP is proposed? – Jürgen K. Oct 04 '21 at 17:57
@JürgenK - Here's the original paper from Zisserman's group at Oxford, who originally proposed the PASCAL VOC in 2009. The definition of mAP is on Page 11. http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf – rayryeng Aug 11 '22 at 05:31

score 53 · Accepted Answer · edited Oct 08 '18 at 10:38

53

Quotes are from the above mentioned Zisserman paper - 4.2 Evaluation of Results (Page 11):

First an "overlap criterion" is defined as an intersection-over-union greater than 0.5. (e.g. if a predicted box satisfies this criterion with respect to a ground-truth box, it is considered a detection). Then a matching is made between the GT boxes and the predicted boxes using this "greedy" approach:

Detections output by a method were assigned to ground truth objects satisfying the overlap criterion in order ranked by the (decreasing) confidence output. Multiple detections of the same object in an image were considered false detections e.g. 5 detections of a single object counted as 1 correct detection and 4 false detections

Hence each predicted box is either True-Positive or False-Positive. Each ground-truth box is True-Positive. There are no True-Negatives.

Then the average precision is computed by averaging the precision values on the precision-recall curve where the recall is in the range [0, 0.1, ..., 1] (e.g. average of 11 precision values). To be more precise, we consider a slightly corrected PR curve, where for each curve point (p, r), if there is a different curve point (p', r') such that p' > p and r' >= r, we replace p with maximum p' of those points.

What is still unclear to me is what is done with those GT boxes that are never detected (even if the confidence is 0). This means that there are certain recall values that the precision-recall curve will never reach, and this makes the average precision computation above undefined.

Edit:

Short answer: in the region where the recall is unreachable, the precision drops to 0.

One way to explain this is to assume that when the threshold for the confidence approaches 0, an infinite number of predicted bounding boxes light up all over the image. The precision then immediately goes to 0 (since there is only a finite number of GT boxes) and the recall keeps growing on this flat curve until we reach 100%.

edited Oct 08 '18 at 10:38

ZhaoGang

4,491
1
27
39

answered Apr 02 '17 at 13:27

Jonathan

823
7
6

3

That is not the only thing that is unclear. Consider a case where there are two predicted boxed (P1, P2) and two ground-truth boxes (T1, T2), where P2 has higher confidence than P1. Both P1 and P2 overlap T1. Since P2 has the higher confidence, it is clear that P2 should be considered the match for T1. What is not given is if P1 also has some IOU overlap with T2, but lower than the IOU with T1, should P1 be given a "second chance" to try to match itself to T2, or should it not? – Martin Apr 13 '17 at 13:06
Can someone clarify the issue with the undetected GT boxes? – Jonathan Sep 04 '17 at 10:05
@Jonathan: so do we simply discard the predictions with IoU<0.5 and compute the area under PR curve for the predictions with IoU>=0.5? – Alex Oct 20 '17 at 16:50
@Alex No. The predictions with IoU<0.5 are False Positives. – Jonathan Oct 22 '17 at 19:59
@Jonathan: ok so If I have 1 class with 3 instances, IoUs with gt masks are 0.1,0.6,0.7, and corresponding precisions are 0.7,0.5,0.2, and I use a threshold of IoU>0.5, what would be the precision in this case? Is it 2/3 (share of predictions with IoU>0.5) or 0.35 (0.2+0.5)/2? – Alex Oct 22 '17 at 21:03
I assume you meant "corresponding confidence scores". – Jonathan Oct 29 '17 at 08:55
@Alex The precisions in your case are [1, 0, 0.5, 0.666] Corresponding to recalls of [0, 0, 0.333, 0.666] However, the "corrected" precision recall map is: [1, 0.666, 0.666, 0.666] Corresponding to the same recalls. The *average* precision is the average of 11 precision values interpolated from this curve at recalls [0, 0.1, .... 1.0]. We set precision 0 to the unachievable recalls above 0.666. So it's avg([1, 0.666, 0.666, 0.666, 0.666, 0.666, 0.666, 0, 0, 0, 0]) = 0.394 – Jonathan Oct 29 '17 at 09:12
@Jonathan: what is there is one prediction for multiple objects? If there's 1 prediction for 4 object, will it count as 1 tp and 3 fp? – Alex Dec 17 '17 at 10:18
Fig.5 of the YOLOv3 paper (https://arxiv.org/pdf/1804.02767.pdf) shows the results of two hypothetical detectors. It is stated that their mAP is the same. Is this correct? If I understand the greedy approach correctly, all but one bounding box would be classified as false detections and thus the mAP would NOT be the same. – gebbissimo Aug 06 '18 at 12:33
1

@Martin as you can see from [github](https://github.com/Cartucho/mAP/blob/master/main.py#L576), P1 will not have a second chance but a false positive. – qq456cvb May 11 '19 at 01:29
@qq456cvb Your link should be updated: https://github.com/Cartucho/mAP/blob/20d2f89/main.py#L576 – davidvandebunte Mar 26 '20 at 17:36
what should be the minimum mAp score for object detection model ? – Hamza Jun 03 '21 at 11:24

score 38 · Answer 3 · edited Oct 03 '18 at 19:25

38

For detection, a common way to determine if one object proposal was right is Intersection over Union (IoU, IU). This takes the set A of proposed object pixels and the set of true object pixels B and calculates:

Commonly, IoU > 0.5 means that it was a hit, otherwise it was a fail. For each class, one can calculate the

True Positive TP(c): a proposal was made for class c and there actually was an object of class c
False Positive FP(c): a proposal was made for class c, but there is no object of class c
Average Precision for class c:

The mAP (mean average precision) is then:

Note: If one wants better proposals, one does increase the IoU from 0.5 to a higher value (up to 1.0 which would be perfect). One can denote this with mAP@p, where p \in (0, 1) is the IoU.

mAP@[.5:.95] means that the mAP is calculated over multiple thresholds and then again being averaged

Edit: For more detailed Information see the COCO Evaluation metrics

edited Oct 03 '18 at 19:25

Brad Knox

115
1
4

answered Dec 18 '17 at 17:42

mrk

8,059
3
56
78

Is there actually a original paper existing, where mAP is proposed? Or where do you have this from? – Jürgen K. Oct 04 '21 at 17:59
1

not sure whether the COCO paper is the original source, however this is the one that at least in my opinion sets the definition for mAP currently. You can find more information when clicking on the Evaluation metrics link at the bottomt of my post. Also here is the paper on COCO https://arxiv.org/pdf/1405.0312.pdf however they do not spend too much time on detailing the evaluation. Nevertheless they refer to their evaluation code available in python here: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py – mrk Oct 05 '21 at 14:15
Thank you @mrk, I have one question: The model produce, for every bounding box, a confidence value. How this confidence value affect these formulas that you explained above? – Francesco Taioli Nov 09 '21 at 13:43

score 10 · Answer 4 · answered Nov 17 '17 at 04:13

I think the important part here is linking how object detection can be considered the same as the standard information retrieval problems for which there exists at least one excellent description of average precision.

The output of some object detection algorithm is a set of proposed bounding boxes, and for each one, a confidence and classification scores (one score per class). Let's ignore the classification scores for now, and use the confidence as input to a threshold binary classification. Intuitively, the average precision is an aggregation over all choices for the threshold/cut-off value. But wait; in order to calculate precision, we need to know if a box is correct!

This is where it gets confusing/difficult; as opposed to typical information retrieval problems, we actually have an extra level of classification here. That is, we can't do exact matching between boxes, so we need to classify if a bounding box is correct or not. The solution is to essentially do a hard-coded classification on the box dimensions; we check if it sufficiently overlaps with any ground truth to be considered 'correct'. The threshold for this part is chosen by common sense. The dataset you are working on will likely define what this threshold for a 'correct' bounding box is. Most datasets just set it at 0.5 IoU and leave it at that (I recommend doing a few manual IoU calculations [they're not hard] to get a feel for how strict IoU of 0.5 actually is).

Now that we have actually defined what it means to be 'correct', we can just use the same process as information retrieval.

To find mean average precision (mAP), you just stratify your proposed boxes based on the maximum of the classification scores associated with those boxes, then average (take the mean) of the average precision (AP) over the classes.

TLDR; make the distinction between determining if a bounding box prediction is 'correct' (extra level of classification) and evaluating how well the box confidence informs you of a 'correct' bounding box prediction (completely analogous to information retrieval case) and the typical descriptions of mAP will make sense.

It's worth noting that Area under the Precision/Recall curve is the same thing as average precision, and we are essentially approximating this area with the trapezoidal or right-hand rule for approximating integrals.

score 3 · Answer 5 · edited Jan 29 '19 at 04:16

3

Definition: mAP → mean Average Precision

In most of the object detection contests, there are many categories to detect, and the evaluation of the model is performed on one specific category each time, the eval result is the AP of that category.

When every category is evaluated, the mean of all APs is calculated as the final result of the model, which is mAP.

edited Jan 29 '19 at 04:16

Kingsley

14,398
5
31
53

answered Jan 29 '19 at 03:11

刘洪宇

31
4

what should be the minimum mAp score for object detection model – Hamza Jun 03 '21 at 11:24

score -2 · Answer 6 · answered Feb 21 '20 at 17:58

-2

Intersection Over Union (IOU) is measure based on Jaccard Index that evaluates the overlap between two bounding boxes. It requires a ground truth bounding box and a predicted bounding box By applying the IOU we can tell if a detection is valid (True Positive) or not (False Positive).IOU is given by the overlapping area between the predicted bounding box and the ground truth bounding box divided by the area of union between them.

answered Feb 21 '20 at 17:58

Harsh

149
2
11

The question is about mAP and not about IoU. So, you should at least clarify how IoU is related to mAP. – nbro Jun 13 '20 at 14:33

What is the mAP metric and how is it calculated?

6 Answers6

Linked