0

Still learning the nuances of python so please excuse me if the question is trivial for some.

I have a data frame of format:

enter image description here

I want to replace all columns except the index in a way such that:

  1. values between 0.1 and 0.4 are replaced with 'low'
  2. values between 0.4 and 0.6 are replaced with 'medium'
  3. values between 0.6 and 1 are replaced with "high"

So, the resulting column will look like:

enter image description here

I am aware that I can replace a specific column values using something like:

Referred here: https://kanoki.org/2019/07/17/pandas-how-to-replace-values-based-on-conditions/ Replace a specific range of values in a pandas dataframe

df['column1'] = np.where(df['column1'] < 0.4, 'Low', df['column1'])
df['column1'] = np.where(df['column1'].between(0.4,0.6), 'Medium', df['column1'])
df['column1'] = np.where(df['column1'] > 0.6, 'High', df['column1'])

Also, there is one issue with above approach. After first replacement of values I get error enter image description here
Probably because, the string 'low' is now present in df and is causing trouble when comparing with int values.

Is there a way to replace for all three columns in a more pythonic way? One may ignore first column because I might as well drop it temporarily and then add it back after value replacement in column1/2/3. My main question is how to replace for all columns in a concise and pythonic way.

Thanks,

Aisha
  • 91
  • 7

1 Answers1

5

You can use numpy.select:

>>> import numpy as np
>>> subset = df[['column1', 'column2', 'column3']]
>>> df.loc[:, subset.columns] =  np.select([subset < 0.4,
                                           (0.4 <= subset) & (subset <= 0.6),
                                           subset > 0.6],
                                           ['Low', 'Medium', 'High'])

   my_index column1 column2 column3
0        54  Medium  Medium    High
1       101     Low  Medium    High
2        75    High    High     Low
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52