I asked this question and it helped me, but now my task is more complex.
My dataframe has ~100 columns and values with 14 scales.
{'Diseasetype': {0: 'Oncology',
1: 'Oncology',
2: 'Oncology',
3: 'Nononcology',
4: 'Nononcology',
5: 'Nononcology'},
'Procedures1': {0: 100, 1: 300, 2: 500, 3: 200, 4: 400, 5: 1000},
'Procedures2': {0: 1, 1: 3, 2: 5, 3: 2, 4: 4, 5: 10},
'Procedures100': {0: 1000, 1: 3000, 2: 5000, 3: 2000, 4: 4000, 5: 10000}}
I want to convert each value in each column of the dataframe into a bucket value.
My current solution is:
def encoding(col, labels):
return np.select([col<200, col.between(200,500), col.between(500,1000), col>1000], labels, 0)
onc_labels = [1,2,3,4]
nonc_labels = [11,22,33,44]
msk = df['Therapy_area'] == 'Oncology'
df[cols] = pd.concat((df.loc[msk, cols].apply(encoding, args=(onc_labels,)), df.loc[msk, cols].apply(encoding, args=(nonc_labels,)))).reset_index(drop=True)
It works well, if all columns of the dataframe has the same scale, but they do not. Remember, I have 14 different scales.
I would like to update the code above (or get another solution), which would allow me to bucket data. I cannot use the same range of values for bucketing everything.
My logic is the following:
If Disease
== Oncology
and Procedures1
on this scale, convert values to these buckets (1, 2, 3)
If Disease
== Oncology
and Procedures2
on this scale, convert values to these buckets (1, 2, 3)
If Disease
!= Oncology
and Procedures77
on this scale, convert values to these buckets (4, 5, 6)
Example of a scale and buckets:
Procedures1 for Oncology: < 200 = 1, 200-400 = 2, >400 = 3
Procedures2 for Oncology: < 2 = 1, 2-4 = 2, >4 = 3
Procedures3 for Oncology: < 2000 = 1, 2000-4000 = 2, >4000 = 3
Procedures1 for nonOncology: < 200 = 4, 200-400 = 5, >400 = 6
Procedures2 for nonOncology: < 2 = 4, 2-4 = 5, >4 = 6
Procedures3 for nonOncology: < 2000 = 4, 2000-4000 = 5, >4000 = 6
Expected output (happy to provide more info!)
Diseasetype Procedures1 Procedures2 Procedures100
Oncology 1 1 1
Oncology 2 2 2
Oncology 3 3 3
Nononcology 4 4 4
Nononcology 5 5 5
Nononcology 6 6 6