How to create labels (auto encoding) in Python

Question

I have a data frame like this:

Name  subname Feature1  Feature2 ...
AAA     a     0.123     0.345 ...
AAA     b     0.123     0.345 ...
BBB     a     0.123     0.345 ...
BBB     b     0.123     0.345 ...

I want to create labels (adding a new column):

Name  subname Feature1  Feature2 ...Class
AAA     a     0.123     0.345 ...    1
AAA     b     0.123     0.345 ...    1
BBB     a     0.123     0.345 ...    2
BBB     b     0.123     0.345 ...    2

So that I can fit the data into a classification model, is there a way that I can create those labels in an efficient way? I got more than 5000 rows, many thanks.

What are you encoding off of, just `Name`? If so `df['Class'] = pd.factorize(df['Name'])[0] + 1`, if it is a `2D` factorization you can use `np.unique` with `return_inverse` — user3483203, Jul 22 '19 at 15:48
You have the labels in a seperate dataframe or series or what? — adnanmuttaleb, Jul 22 '19 at 15:50
yes only according the 'Name', have the labels in the last column so that I can fit in a GDBT model to select features, is it the proper way to do that? — Cecilia, Jul 22 '19 at 15:51
Both answers in the dupe are perfectly valid, you can choose which to use — user3483203, Jul 22 '19 at 15:53
@user3483203 I checked some tutorials on Google, can I use something like 'from sklearn.preprocessing import LabelEncoder'? Would this be the same? Many thanks. — Cecilia, Jul 22 '19 at 16:07

score 1 · Accepted Answer · answered Jul 22 '19 at 15:52

1

You can try

labels, uniques = pd.factorize(df['Name'].tolist())
df['labels'] = labels

and will get an array([0, 0, 1, 1])

answered Jul 22 '19 at 15:52

Lumos

570
1
11
24

How to create labels (auto encoding) in Python

1 Answers1