Why is there a discrepancy in the imagenet dataset labels?

Question

Are the labels used for training and the ones used for validation the same? I thought they should be the same; however, there seem to be a discrepancy in the labels that are available online. When I downloaded the imagenet 2012 labels for its validation data from the official website, I get labels that start with kit_fox as the first label, which matches the exact 2012's dataset validation images I downloaded from the official website. This is the example of the labels: https://gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57

However, for almost all the pretrained models, including those trained by Google, the imagenet labels they use for training, actually start with tench, tinca tinca instead. See here: https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a

Why is there such a huge discrepancy? Where did the 'tinca tinca' kind of labels come from?

If we use the first label mapping that corresponds to the actual validation images, we face another problem: 2 classes ("Crane" and "maillot") are actually duplicated, i.e. they have the same name but refer to different kind of crane - the mechanical crane and the animal crane - resulting in 100 image in 2 of the classes instead of the supposed 50. If we do not use the first mapping, where is a reliable source of the validation images that correspond to the second label mapping?

I also realised that 'maillot' is present twice in the dataset, and it means the same thing both times. 'crane' is also present twice, but here we have different meanings - the bird and the object. — anushka, Feb 23 '20 at 18:52

score 0 · Answer 1 · answered Jul 25 '17 at 00:27

0

I have the same problem in my finetuning. You solve your problem change the name of classes tench, tinca tinca to the synset number. You can find here the mapping

answered Jul 25 '17 at 00:27

Glauco Roberto

27
4

Why is there a discrepancy in the imagenet dataset labels?

1 Answers1