How to replace values in a column based on conditions from multiple columns in pandas

Question

I want to replace the values in the column 'Risk Rating' if and only if three conditions are met from three different columns of the dataframe. I did it using mask technique and also by .loc method but it did not work for me. I want to do this for 9 rows only. I want to replace the 'Risk Rating' value from 0 to 9 for this singular case. The length of the dataframe is 180002. Here is the code that I wrote:

safety.loc[((safety['Employee Name']=="Shabbir Hussain") & (safety['Employee Number']==11231) & 
(safety['Attendance Date']=="2020-03-12")),['Risk Rating']]=9

mask = (safety['Employee Name']=="Shakir Hussain") & (safety['Employee Number']==11026) & 
(safety['Attendance Date']=="2020-03-12") & (safety['Risk Rating']==0)
safety['Risk Rating'][mask]=9

ThePyGuy · Accepted Answer · 2021-02-20T11:06:34.900

0

mask = (safety['Employee Name']=="Shakir Hussain") & 
       (safety['Employee Number']==11026) & 
       (safety['Attendance Date']=="2020-03-12") & 
       (safety['Risk Rating']==0)

If you want to assign the values conditionally, you need to use .loc to locate the particular index, and then you can assign the value.

safety.loc[mask, 'Risk Rating']=9

Or you can use numpy select as well to apply the masking..

safety['Risk Rating'] = np.select([mask], [9], default=safety['Risk Rating'])

edited Feb 20 '21 at 11:06

answered Feb 20 '21 at 10:29

ThePyGuy

17,779
5
18
45

The code runs perfectly but it still does not give the desired results. The value is still 0. – Hammad Malick Feb 20 '21 at 10:56
1

That's because no dataframe rows match that mask criteria. Can you try printing mask.sum() ? – ThePyGuy Feb 20 '21 at 10:58
I tried it for another row and this worked. You're right maybe the record isn't there in the dataframe. – Hammad Malick Feb 20 '21 at 11:02
Another issue is because of datetime format. Maybe your `Attendance Date` is not in string format but `pandas datetime`. – Muhammad Yasirroni Feb 20 '21 at 11:03
I converted it into string format as datetime format couldn't be processed. – Hammad Malick Feb 20 '21 at 11:06

Muhammad Yasirroni · Answer 2 · 2021-02-20T11:10:44.000

0

Improving @Bikhyat Adhiakri answer, considering you will process thousands of rows, use numpy instead:

import numpy as np

arr = safety.to_numpy()

# replace 0, 1, 2 with the row numbers
mask = (arr[:,0] == "Shakir Hussain") * (arr[:,1] == 11026) * (df_np[:,2] == "2020-03-12")

arr[mask,4] = 9 # but your data will be in numpy format

# or you can use
# safety.loc[mask, 'Risk Rating'] = 9

numpy might make the process 1000 times faster for large row numbers.

See: https://stackoverflow.com/a/64504183/11671779

edited Feb 20 '21 at 11:10

answered Feb 20 '21 at 11:01

Muhammad Yasirroni

1,512
12
22

Thankyou for the alternative solution. I need to process only 9 rows. But yes numpy indeed makes it faster. – Hammad Malick Feb 20 '21 at 11:04

How to replace values in a column based on conditions from multiple columns in pandas

2 Answers2