I have the dataframe -
df = pd.DataFrame({'colA':['a', 'a', 'a', 'b' ,'b'], 'colB':['a', 'b', 'a', 'c', 'b'], 'colC':['x', 'x', 'y', 'y', 'y']})
I would like to write a function to replace each value with it's frequency count in that column. For example colA will now be [3, 3, 3, 2, 2]
I have attempted to do this by creating a dictionary with the value and the frequency count, assign that dictionary to a variable freq
, then map the column values to freq
. I have written the following function
def LabelEncode_method1(col):
freq = col.value_counts().to_dict()
col = col.map(freq)
return col.head()```
When I run the following LabelEncode_method1(df.colA)
, I get the result 3, 3, 3, 2, 2
. However when I call the dataframe df
, the values for colA
are still 'a', 'a', 'a', 'b', 'b'
- What am I doing wrong. How do I fix my function?
- How do I write another function that loops through all columns and maps the values to
freq
, as opposed to calling the function 3 separate times for each column.