How to return "True" if nth greatest value in column of data frame is a number >=x

Question

I have a dataframe that looks like this:

   prod_id  Fac     demand_near_wh
0   45906   Fac-1   0
1   45906   Fac-2   51
2   45906   Fac-3   30
3   45906   Fac-4   10
4   45906   Fac-5   52
5   48402   Fac-1   0
6   48402   Fac-2   72
7   48402   Fac-3   39
8   48402   Fac-4   14
9   48402   Fac-5   6

I want to create four new columns titled "2nd_wh", "3rd_wh", and "4th_wh", and "5th_wh". For each of these new columns I want to return "True" if the value in "demand_near_wh" is the nth highest number (i.e. 3rd highest, 4th highest, 5th highest) and >=30 for a given prod_id, otherwise return "False".

For example, in the "3rd_wh" column it will return "True" if the value in the "demand_near_wh" is the 3rd highest value for a given product_id and is >=30, otherwise "False". In the "4th_wh" column, it will return "True" if the value in the "demand_near_wh" for a given prod_id is the 4th highest value and is >=30, otherwise "False", etc.

The final output should look like this:

   prod_id  Fac     demand_near_wh 1st_wh 2nd_wh 3rd_wh 4th_wh  5th_wh  
0   45906   Fac-1   0               False  False  False  False  True 
1   45906   Fac-2   51              False  True   False  False  False
2   45906   Fac-3   30              False  False  True   False  False
3   45906   Fac-4   10              False  False  False  True   False
4   45906   Fac-5   52              True   False  False  False  False
5   48402   Fac-1   0               False  False  False  False  True
6   48402   Fac-2   72              True   False  False  False  False
7   48402   Fac-3   39              False  True   False  False  False
8   48402   Fac-4   14              False  False  False  True   False
9   48402   Fac-5   6               False  False  False  True   False

I've tried the code below, which worked for the "1st_wh" column..

df['1st_wh']=df.groupby(['prod_id'])['perc_dem_near_wh'].transform('max')==df['perc_dem_near_wh']
print(df.head(5))

   prod_id Facility  perc_dem_near_wh  1st_wh
0    45906    Fac-1                 0   False
1    45906    Fac-2                51   False
2    45906    Fac-3                30   False
3    45906    Fac-4                10   False
4    45906    Fac-5                52    True

However, when I populate the column "2nd_wh" with the code below, I get an error that says "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I'm very new to Python and don't know how to fix it. Thanks in advance for anyone that can help.

df['2nd_wh'] = df['perc_dem_near_wh'].apply(lambda x: 'True' if x >= 30 and df['perc_dem_near_wh'].nlargest(2) else'False')
print (df.head(15))

Does this answer your question? https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o — Ted, Aug 25 '21 at 03:11

How to return "True" if nth greatest value in column of data frame is a number >=x

0 Answers0