0

I have a data frame as below:

df = pd.DataFrame({'Item':['A1','B1','C1','D1'],'Category':['A','A','C','B']})
df
    Item    Category
0   A1      A
1   B1      A
2   C1      C
3   D1      B

I would like to manually cluster them i.e. Category A will belong to Cluster 1, C will belong to 2, B will belong to 3, etc.

df = pd.DataFrame({'Item':['A1','B1','C1','D1'],'Category':['A','A','C','B'], 'Label':['1','1','2','3']})
df

    Item    Category    Label
0   A1      A           1
1   B1      A           1
2   C1      C           2
3   D1      B           3

I am thinking of doing Label Encoding, are there any other methods that I can try? What is the appropriate way to do it?

Prince Modi
  • 425
  • 1
  • 4
  • 16

2 Answers2

2

IIUC use factorize:

df['Label'] = pd.factorize(df['Category'])[0] + 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

With groupby:

df["Cluster"] = df.groupby("Category", sort=False).ngroup()+1

>>> df
  Item Category  Cluster
0   A1        A        1
1   B1        A        1
2   C1        C        2
3   D1        B        3
not_speshal
  • 22,093
  • 2
  • 15
  • 30