I have a dataframe that looks like this:
prod_id Fac demand_near_wh
0 45906 Fac-1 0
1 45906 Fac-2 51
2 45906 Fac-3 30
3 45906 Fac-4 10
4 45906 Fac-5 52
5 48402 Fac-1 0
6 48402 Fac-2 72
7 48402 Fac-3 39
8 48402 Fac-4 14
9 48402 Fac-5 6
I want to create four new columns titled "2nd_wh"
, "3rd_wh"
, and "4th_wh"
, and "5th_wh"
. For each of these new columns I want to return "True"
if the value in "demand_near_wh"
is the nth highest number (i.e. 3rd highest, 4th highest, 5th highest) and >=30
for a given prod_id
, otherwise return "False"
.
For example, in the "3rd_wh"
column it will return "True"
if the value in the "demand_near_wh"
is the 3rd highest value for a given product_id and is >=30
, otherwise "False"
. In the "4th_wh"
column, it will return "True"
if the value in the "demand_near_wh"
for a given prod_id
is the 4th highest value and is >=30
, otherwise "False"
, etc.
The final output should look like this:
prod_id Fac demand_near_wh 1st_wh 2nd_wh 3rd_wh 4th_wh 5th_wh
0 45906 Fac-1 0 False False False False True
1 45906 Fac-2 51 False True False False False
2 45906 Fac-3 30 False False True False False
3 45906 Fac-4 10 False False False True False
4 45906 Fac-5 52 True False False False False
5 48402 Fac-1 0 False False False False True
6 48402 Fac-2 72 True False False False False
7 48402 Fac-3 39 False True False False False
8 48402 Fac-4 14 False False False True False
9 48402 Fac-5 6 False False False True False
I've tried the code below, which worked for the "1st_wh" column..
df['1st_wh']=df.groupby(['prod_id'])['perc_dem_near_wh'].transform('max')==df['perc_dem_near_wh']
print(df.head(5))
prod_id Facility perc_dem_near_wh 1st_wh
0 45906 Fac-1 0 False
1 45906 Fac-2 51 False
2 45906 Fac-3 30 False
3 45906 Fac-4 10 False
4 45906 Fac-5 52 True
However, when I populate the column "2nd_wh" with the code below, I get an error that says "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I'm very new to Python and don't know how to fix it. Thanks in advance for anyone that can help.
df['2nd_wh'] = df['perc_dem_near_wh'].apply(lambda x: 'True' if x >= 30 and df['perc_dem_near_wh'].nlargest(2) else'False')
print (df.head(15))