0

I have the following pandas dataframe.

        epi_week    state   loc_type    disease    cases    incidence
21835   200011      WY      STATE       MUMPS       2       0.40
21836   197501      WY      STATE       POLIO       3       0.76
21837   199607      WY      STATE       HEPATITIS   3       0.61
21838   197116      WY      STATE       MUMPS       6       1.73
21839   200048      WY      STATE       HEPATITIS   6       1.21

I am trying to replace each disease by a unique integer. For example 'MUMPS'==1, 'POLIO'==2 etc. The final dataframe should look like follows:

        epi_week    state   loc_type    disease    cases    incidence
21835   200011      WY      STATE       1          2        0.40
21836   197501      WY      STATE       2          3        0.76
21837   199607      WY      STATE       3          3        0.61
21838   197116      WY      STATE       1          6        1.73
21839   200048      WY      STATE       3          6        1.21

I am using the following code:

# creating a dictionary     
disease_dic = {'MUMPS':1, 'POLIO':2, 'MEASLES':3, 'RUBELLA':4,
               'PERTUSSIS':5, 'HEPATITIS A':6, 'SMALLPOX':7, 
               'DIPHTHERIA':8}
data.disease = [disease_dic[item] for item in data.disease]

I am getting following errors:

KeyErrorTraceback (most recent call last)
<ipython-input-115-52394901c90d> in <module>()
----> 1 cdc.disease = [disease_dic[item2] for item2 in cdc.disease]

KeyError: 1

Can anyone please help about this issue? Thank you.

wahidd
  • 77
  • 1
  • 5

1 Answers1

0

Using apply.

Ex:

disease_dic = {'MUMPS':1, 'POLIO':2, 'MEASLES':3, 'RUBELLA':4,
               'PERTUSSIS':5, 'HEPATITIS A':6, 'SMALLPOX':7, 
               'DIPHTHERIA':8}
import pandas as pd
df = pd.DataFrame({"disease": disease_dic.keys()})
print(df["disease"].apply(lambda x: disease_dic.get(x)))

Output:

0    4
1    2
2    1
3    8
4    3
5    5
6    7
7    6
Name: disease, dtype: int64
Rakesh
  • 81,458
  • 17
  • 76
  • 113