is there a way to use lambda or quicker way than a dictionary to recode pandas df column of unique categories into integer buckets like 0, 1, 2, etc?

Question

Is there a quicker way via lambda or otherwise to recode the every unique value in a pandas df?

I am trying to recode this without a dictionary or for loop:

   df['Genres'].unique()

array(['Art & Design', 'Art & Design;Pretend Play',
       'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
       'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
       'Comics', 'Comics;Creativity', 'Communication', 'Dating',
       'Education', 'Education;Creativity', 'Education;Education',
       'Education;Action & Adventure', 'Education;Pretend Play',...

It goes on for a while - a lot of unique values!

I would like to recode to 0, 1, 2, 3, etc accordingly.

TIA for any advice

If you don't share your attempt or the issue, it's close to random guessing. Please provide a reproducible example as well as the issue/problem you need help with. https://stackoverflow.com/help/how-to-ask — Celius Stingher, Aug 05 '22 at 18:46

score 1 · Accepted Answer · answered Aug 05 '22 at 19:01

This can be done factorize

df['Encoding'] = pd.factorize(df['Values'])[0]

Let's say I use your sample as input:

df = pd.DataFrame({'Values':['Art & Design', 'Art & Design;Pretend Play',
       'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
       'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
       'Comics', 'Comics;Creativity', 'Communication', 'Dating',
       'Education', 'Education;Creativity', 'Education;Education',
       'Education;Action & Adventure', 'Education;Pretend Play']})

Using the code proposed above, I get:

                             Values  Encoding
0                      Art & Design         0
1         Art & Design;Pretend Play         1
2           Art & Design;Creativity         2
3   Art & Design;Action & Adventure         3
4                                13         4
5                   Auto & Vehicles         5
6                            Beauty         6
7                 Books & Reference         7
8                          Business         8
9                            Comics         9
10                Comics;Creativity        10
11                    Communication        11
12                           Dating        12
13                        Education        13
14             Education;Creativity        14
15              Education;Education        15
16     Education;Action & Adventure        16
17           Education;Pretend Play        17

score 0 · Answer 2 · answered Aug 05 '22 at 19:02

0

I think you want to assign each genre to its index in

df['Genres'].unique()

Then you can simply call this

df['recodes'] = df.Genres.apply(lambda x: df['Genres'].unique().index(x))

answered Aug 05 '22 at 19:02

Nuri Taş

3,828
2
4
22

score 0 · Answer 3 · answered Aug 05 '22 at 19:07

0

You can do something really dumb (literally) like pd.get_dummies(df["Genres"]).idxmax(axis=1).

Go with the factorization one above. Can't beat that one.

answered Aug 05 '22 at 19:07

O.rka

29,847
68
194
309

is there a way to use lambda or quicker way than a dictionary to recode pandas df column of unique categories into integer buckets like 0, 1, 2, etc?

3 Answers3