2

I want to label encode subgroups in a pandas dataframe. Something like this:

| Category   | | Name      |
| ---------- | | --------- | 
| FRUITS     | | Apple     |
| FRUITS     | | Orange    |
| FRUITS     | | Apple     |
| Vegetables | | Onion     |
| Vegetables | | Garlic    |
| Vegetables | | Garlic    |  

to

| Category   | | Name    | | Label |
| ---------- | | ------- | | ----- |
| FRUITS     | | Apple   | | 1     |
| FRUITS     | | Orange  | | 2     |
| FRUITS     | | Apple   | | 1     |
| Vegetables | | Onion   | | 1     |
| Vegetables | | Garlic  | | 2     |
| Vegetables | | Garlic  | | 2     |
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91

2 Answers2

1

Try to group-by "Category" and then group-by "Name" and use .ngroup():

df["Label"] = (
    df.groupby("Category")
    .apply(lambda x: x.groupby("Name", sort=False).ngroup() + 1)
    .values
)
print(df)

Prints:

     Category    Name  Label
0      FRUITS   Apple      1
1      FRUITS  Orange      2
2      FRUITS   Apple      1
3  Vegetables   Onion      1
4  Vegetables  Garlic      2
5  Vegetables  Garlic      2
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

You can use factorize per group:

df['Label'] = (df.groupby('Category')['Name']
               .transform(lambda x: pd.factorize(x)[0])
               .add(1)
               )

Output:

     Category    Name  Label
0      FRUITS   Apple      1
1      FRUITS  Orange      2
2      FRUITS   Apple      1
3  Vegetables   Onion      1
4  Vegetables  Garlic      2
5  Vegetables  Garlic      2
mozway
  • 194,879
  • 13
  • 39
  • 75