0

LabelEncoder is used to generate labels for pytorch projects. Codes like:

from sklearn.preprocessing import LabelEncoder  
label_encoder = LabelEncoder()
label_encoder.fit(annotation['instance_ids'])
annotation['labels'] = list(map(int,label_encoder.transform(annotation['instance_ids'])))

The question is:

  1. whether the labels generated are strictly same in different runnings? More specifically, will instance_id_1 be mapped to label_1 at all times.
  2. what's the order rule to generate the labels?
Ink
  • 845
  • 1
  • 13
  • 31
  • It seems to be lexicographic order according to [this other question](https://stackoverflow.com/questions/51308994/python-sklearn-determine-the-encoding-order-of-labelencoder) that may help you by the way. – Barbara Gendron Feb 21 '23 at 13:04

1 Answers1

1

From the below image, after label encoding, the numeric value is assigned to each of the categorical values. You might be wondering why the numbering is not in sequence (Top-Down), and the answer is that the numbering is assigned in alphabetical order. Delhi is assigned 0 followed by Gujarat as 1 and so on. enter image description here

Very importance:

Label encoding converts the data in machine-readable form, but it assigns a unique number(starting from 0) to each class of data. This may lead to the generation of priority issues in the training of data sets. A label with a high value may be considered to have high priority than a label having a lower value.

Example An attribute having output classes Mexico, Paris, Dubai. On Label Encoding, this column lets Mexico is replaced with 0, Paris is replaced with 1, and Dubai is replaced with 2. With this, it can be interpreted that Dubai has high priority than Mexico and Paris while training the model, But actually, there is no such priority relation between these cities here.