-2

Is there a quicker way via lambda or otherwise to recode the every unique value in a pandas df?

I am trying to recode this without a dictionary or for loop:

   df['Genres'].unique()

array(['Art & Design', 'Art & Design;Pretend Play',
       'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
       'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
       'Comics', 'Comics;Creativity', 'Communication', 'Dating',
       'Education', 'Education;Creativity', 'Education;Education',
       'Education;Action & Adventure', 'Education;Pretend Play',...

It goes on for a while - a lot of unique values!

I would like to recode to 0, 1, 2, 3, etc accordingly.

TIA for any advice

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • If you don't share your attempt or the issue, it's close to random guessing. Please provide a reproducible example as well as the issue/problem you need help with. https://stackoverflow.com/help/how-to-ask – Celius Stingher Aug 05 '22 at 18:46

3 Answers3

1

This can be done factorize

df['Encoding'] = pd.factorize(df['Values'])[0]

Let's say I use your sample as input:

df = pd.DataFrame({'Values':['Art & Design', 'Art & Design;Pretend Play',
       'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
       'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
       'Comics', 'Comics;Creativity', 'Communication', 'Dating',
       'Education', 'Education;Creativity', 'Education;Education',
       'Education;Action & Adventure', 'Education;Pretend Play']})

Using the code proposed above, I get:

                             Values  Encoding
0                      Art & Design         0
1         Art & Design;Pretend Play         1
2           Art & Design;Creativity         2
3   Art & Design;Action & Adventure         3
4                                13         4
5                   Auto & Vehicles         5
6                            Beauty         6
7                 Books & Reference         7
8                          Business         8
9                            Comics         9
10                Comics;Creativity        10
11                    Communication        11
12                           Dating        12
13                        Education        13
14             Education;Creativity        14
15              Education;Education        15
16     Education;Action & Adventure        16
17           Education;Pretend Play        17
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
0

I think you want to assign each genre to its index in

df['Genres'].unique()

Then you can simply call this

df['recodes'] = df.Genres.apply(lambda x: df['Genres'].unique().index(x))
Nuri Taş
  • 3,828
  • 2
  • 4
  • 22
0

You can do something really dumb (literally) like pd.get_dummies(df["Genres"]).idxmax(axis=1).

Go with the factorization one above. Can't beat that one.

O.rka
  • 29,847
  • 68
  • 194
  • 309