0

I am trying to learn some facial landmark detection model, and notice that many of them use NME(Normalized Mean Error) as performance metric:

enter image description here

The formula is straightforward, it calculate the l2 distance between ground-truth points and model prediction result, then divided it by a normalized factor, which vary from different dataset.

However, when adopting this formula on some landmark detector that some one developed, i have to deal with this non-trivial situation, that is some detector may not able to generate enough number landmarks for some input image(might because of NMS/model inherited problem/image quality etc). Thus some of ground-truth points might not have their corresponding one in the prediction result.

So how to solve this problem, should i just add such missing point result to "failure result set" and use FR to measure the model, and ignore them when doing the NME calculation?

user8510613
  • 1,242
  • 9
  • 27
  • I’m voting to close this question because it is not about programming as defined in the [help] but about DL theory and/or methodology - please see the intro and NOTE in the `deep-learning` [tag info](https://stackoverflow.com/tags/deep-learning/info). – desertnaut Sep 13 '21 at 15:14

1 Answers1

0

If you have as output of neural network an vector 10x1 as example that is your points like [x1,y1,x2,y2...x5,y5]. This vector will be fixed length cause of number of neurons in your model. If you have missing points - this is because (as example you have 4 from 5 points) some points are go beyond the image width and height. Or are with minus (negative) like [-0.1, -0.2, 0.5,0.7 ...] there first 2 points you can not see on image like they are mission but they will be in vector and you can callculate NME. In some custom neural nets that can be possible, because missing values will be changed to biggest error points.

  • Thanks for your reply, but some model might not just have same size output just like gt, like faster-RCNN(i know its not a landmark detection model, just an example), the model itself might yield much more landmarks than gt, and only after some post process (like nms) it yield the final result. However because the involve of post process, the output num might not be the same – user8510613 Sep 13 '21 at 06:54