0

I have a pandas dataframe where I've so far created something similar to the below table:

Parent      Child     Overall Risk            Bucket
0  CW12345  CW34565          Low       Low to High
1  CW12345  CW28394         High               N/A
2  CW12345  CW77646     Moderate  Moderate to High
3  CW12345  CW09871         High               N/A
4  CW12345  CW66542          Low   Low to Critical
5  CW12345  CW13356     Critical               N/A
6  CW12345  CW98452          Low   Low to Critical
7  CW12345  CW78899     Critical               N/A

The full report has about 1,000 Parents which in turn have x number of Children. The Overall Risk field is tied to the Child. The exercise is to evaluate if the Risk level of Child 2 is greater than Child 1 (and if 3 is greater than 2 and so on). I need the Bucket column to stop after it has identified a "to x" condition, such as Low to High within the same parent. Once it has identified Low to High, it should not identify any other "x to High" scenario and instead replace with N/A. Basically each combination of occurrences should only be able to appear once. Below is the kind of output I'm looking for (New Bucket column).

Parent    Child       Overall Risk            Bucket       New Bucket
0  CW12345  CW34565          Low       Low to High      Low to High
1  CW12345  CW28394         High               N/A              N/A
2  CW12345  CW77646     Moderate  Moderate to High              N/A
3  CW12345  CW09871         High               N/A              N/A
4  CW12345  CW66542          Low   Low to Critical  Low to Critical
5  CW12345  CW13356     Critical               N/A              N/A
6  CW12345  CW98452          Low   Low to Critical              N/A
7  CW12345  CW78899     Critical               N/A              N/A

This question is similar to this article but I need the python version. Any help is much appreciated. Ultimately this is being written to excel, so I need the solution to generate a column.

Alex Dowd
  • 39
  • 9
  • Please don't include your sample data as an image, it's impossible to copy/paste this for testing. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for some suggestions. – BeRT2me Nov 11 '22 at 20:45
  • you are basically looking for the highest value conditioned on the current highest value? – codeBB Nov 11 '22 at 20:48
  • Please, type `df.head(10).to_dict()` and copy and paste it to the question instead of the image. – Hamzah Nov 11 '22 at 20:51
  • edits made you can now copy and paste, thank you for the feedback. @codeBB yes basically find the first highest value (say anything "to high" is worth 3) and then replace all the other 3s with N/A. Hopefully that makes sense. – Alex Dowd Nov 11 '22 at 21:09

1 Answers1

0

Update: this solution worked for me without creating a new column.

cols = ['Bucket]
for i in cols:
    df[i] = df[i].mask(df.duplicates(['Parent'], i]))
Alex Dowd
  • 39
  • 9