0

I want to replace the values in the column 'Risk Rating' if and only if three conditions are met from three different columns of the dataframe. I did it using mask technique and also by .loc method but it did not work for me. I want to do this for 9 rows only. I want to replace the 'Risk Rating' value from 0 to 9 for this singular case. The length of the dataframe is 180002. Here is the code that I wrote:

safety.loc[((safety['Employee Name']=="Shabbir Hussain") & (safety['Employee Number']==11231) & 
(safety['Attendance Date']=="2020-03-12")),['Risk Rating']]=9

mask = (safety['Employee Name']=="Shakir Hussain") & (safety['Employee Number']==11026) & 
(safety['Attendance Date']=="2020-03-12") & (safety['Risk Rating']==0)
safety['Risk Rating'][mask]=9

2 Answers2

0
mask = (safety['Employee Name']=="Shakir Hussain") & 
       (safety['Employee Number']==11026) & 
       (safety['Attendance Date']=="2020-03-12") & 
       (safety['Risk Rating']==0)

If you want to assign the values conditionally, you need to use .loc to locate the particular index, and then you can assign the value.

safety.loc[mask, 'Risk Rating']=9

Or you can use numpy select as well to apply the masking..

safety['Risk Rating'] = np.select([mask], [9], default=safety['Risk Rating'])
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
0

Improving @Bikhyat Adhiakri answer, considering you will process thousands of rows, use numpy instead:

import numpy as np

arr = safety.to_numpy()

# replace 0, 1, 2 with the row numbers
mask = (arr[:,0] == "Shakir Hussain") * (arr[:,1] == 11026) * (df_np[:,2] == "2020-03-12")

arr[mask,4] = 9 # but your data will be in numpy format

# or you can use
# safety.loc[mask, 'Risk Rating'] = 9

numpy might make the process 1000 times faster for large row numbers.

See: https://stackoverflow.com/a/64504183/11671779

Muhammad Yasirroni
  • 1,512
  • 12
  • 22