I have data something like below:
CANDIDATE_ID | Job1_Skill1 |
---|---|
12 | conflict management |
13 | asset management |
I want to add one hot encoded columns for each skill in table python and pandas based on the reference skill set(list). for example if reference skill set given is [conflict management, asset management, .net] then my output should be something like below:
CANDIDATE_ID | Job1_Skill1 | FP_conflict management | FP_ asset management | FP_.net |
---|---|---|---|---|
12 | conflict management | 1 | 0 | 0 |
13 | asset management | 0 | 1 | 0 |
I could do it comparing row by row but it does not seem to be an efficient approach. Can anyone suggest efficient way to do this using python?
get_dummies method gives output based on values in same column but I need to compare values for a specific reference list to encode i.e. get_dummies can give encoding only for FP_Conflict_management and FP_asset_management and not for FP_.net and also get_dummies will be dynamic for each dataframe. I need to encode based on specific list of skills for every dataframe but I need to compare the values with different column for encoding hence it cannot be used.