0

I have a pandas dataframe, combined_copy which has a column, job_industry_category with multiple categories. I want to use a mapping function to restrict these categories to the top 3 by distribution, the rest form one other category. I intend to use numeric digits as follows:

combined_copy['job_industry_category'].map({'Manufacturing' : 0, 'Financial Services' : 1, 'Health' : 2})

I need help on how to map all the other remaining categories into a class 3, as one category.

The list of all categories in this column is as follows: ['Manufacturing', 'Financial Services', 'Not Specified', 'Health', 'Retail', 'Property', 'IT', 'Entertainment', 'Argiculture', 'Telecommunications']

I have tried using the na_action argument:

combined_copy['job_industry_category'].map({'Manufacturing' : 0, 'Financial Services' : 1, 'Health' : 2}, na_action={None : 3})

but the other category appears as NaNs. Please kindly assist.

Sting_ZW
  • 68
  • 6
  • [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – It_is_Chris Nov 09 '22 at 13:59

1 Answers1

1

We can use mapping function

>>> def com_map(x):
...     if x=='Manufacturing':
...             return 0
...     else:
...             return 3
...

inside map function

df['j_i_c'].map(com_map)
Y U
  • 104
  • 3