4

I have the dataframe -

df = pd.DataFrame({'colA':['a', 'a', 'a', 'b' ,'b'], 'colB':['a', 'b', 'a', 'c', 'b'], 'colC':['x', 'x', 'y', 'y', 'y']})

I would like to write a function to replace each value with it's frequency count in that column. For example colA will now be [3, 3, 3, 2, 2]

I have attempted to do this by creating a dictionary with the value and the frequency count, assign that dictionary to a variable freq, then map the column values to freq. I have written the following function

def LabelEncode_method1(col): 
   freq = col.value_counts().to_dict()
   col = col.map(freq)
   return col.head()```

When I run the following LabelEncode_method1(df.colA), I get the result 3, 3, 3, 2, 2. However when I call the dataframe df, the values for colA are still 'a', 'a', 'a', 'b', 'b'

  1. What am I doing wrong. How do I fix my function?
  2. How do I write another function that loops through all columns and maps the values to freq, as opposed to calling the function 3 separate times for each column.
cs95
  • 379,657
  • 97
  • 704
  • 746
The Rookie
  • 877
  • 8
  • 15

2 Answers2

3

You can do groupby + transform

df['new'] = df.groupby('colA')['colA'].transform('count')
BENY
  • 317,841
  • 20
  • 164
  • 234
3

You can use map + value_counts (Which you have already found, you just need to assign the result back to your DataFrame).

df['colA'].map(df['colA'].value_counts())

0    3
1    3
2    3
3    2
4    2
Name: colA, dtype: int64

For all columns, which will create a new DataFrame:

pd.concat([
  df[col].map(df[col].value_counts()) for col in df
], axis=1)

   colA  colB  colC
0     3     2     2
1     3     2     2
2     3     2     3
3     2     1     3
4     2     2     3
user3483203
  • 50,081
  • 9
  • 65
  • 94