0

I have a dataset with categorical data that are of different weights, take for an instance Phd has a higher weight than Masters and like MSc is higher than Bsc.

I know am to use Label encoder but i dont want python to assigned codes to these variables arbitrarily. I want higher codes for Phd = 4, Msc = 3 , Bsc = 2, O Levels = 1 and No education = 0.

Is there anyway i can go about this? Can anyone be of help?

seun
  • 1

1 Answers1

1

LabelEncoder will encode the categories based on alphabetic order and stored in classes_ attribute. By default this is the case:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(['Phd', 'Msc','Bsc', 'O Levels','No education'])
ll.classes_
# Output: array(['Bsc', 'Msc', 'No education', 'O Levels', 'Phd'], dtype='|S12')

How many categories are there? If less, you can yourself do the conversion by using a dict, similar to this answer here:

my_dict = {'Phd':4, 'Msc':3 , 'Bsc':2, 'O Levels':1, 'No education':0}

y = ['No education', 'O Levels','Bsc', 'Msc','Phd']
np.vectorize(my_dict.get)(y)

# Output: array([0, 1, 2, 3, 4])
Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132