I have a DataFrame with dozens of columns.
Therapy area Procedures1 Procedures2 Procedures3
Oncology 450 450 2345
Oncology 367 367 415
Oncology 152 152 4945
Oncology 876 876 345
Oncology 1098 1098 12
Oncology 1348 1348 234
Nononcology 225 225 345
Nononcology 300 300 44
Nononcology 267 267 45
Nononcology 90 90 4567
I want to change numeric values in all Procedure
columns into buckets.
For one column it will be something like
def hello(x):
if x['Therapy area'] == 'Oncology' and x['Procedures1'] < 200: return int(1)
if x['Therapy area'] == 'Oncology' and x['Procedures1'] in range (200, 500): return 2
if x['Therapy area'] == 'Oncology' and x['Procedures1'] in range (500, 1000): return 3
if x['Therapy area'] == 'Oncology' and x['Procedures1'] > 1000: return 4
if x['Therapy area'] != 'Oncology' and x['Procedures1'] < 200: return 11
if x['Therapy area'] != 'Oncology' and x['Procedures1'] in range (200, 500): return 22
if x['Therapy area'] != 'Oncology' and x['Procedures1'] in range (500, 1000): return 33
if x['Therapy area'] != 'Oncology' and x['Procedures1'] > 1000: return 44
test['Procedures1'] = test.apply(hello, axis=1)
What is the most efficient way to apply this for dozens of columns with different column names (not Procedures1
, Procedures2
, 'Procedures3` etc)?
When using cut
with specific bins I get the error:
ValueError: bins must increase monotonically.
Since I can have different values how can I solve this with logical operations, not bins?
Also the values can be different depending on the "Therapy area" column, like 11, 22, 33, 44 for Nononcology and 1, 2, 3, 4 for Oncology.