Create binary column in pandas dataframe based on priority

Question

I have a pandas dataframe that looks something like this:

Item    Status
123     B
123     BW
123     W 
123     NF
456     W
456     BW
789     W
789     NF
000     NF

And I need to create a new column Value which will be either 1 or 0 depending on the values in the Item and Status columns. The assignment of the value 1 is prioritized by this order: B, BW, W, NF. So, using the sample dataframe above, the result should be:

Item    Status    Value
123     B         1
123     BW        0
123     W         0
123     NF        0
456     W         0
456     BW        1
789     W         1
789     NF        0
000     NF        1

Using Python 3.7.

How does the `Item` column play a role in this? Do you assign only one `1` value per `Item`, based on the order of priority in `Status`? — AlexK, Apr 14 '19 at 21:28
If there is only one instance of a certain item in the `Item` column, it should be `1` by default. However, when there is more than one occurrence of an item, then, based on the `Status` and the priority, there should be a corresponding value of either `1` or `0`. Hope that makes sense. — erik7970, Apr 14 '19 at 21:34

AlexK · Accepted Answer · 2019-04-14T22:27:56.343

Taking your original dataframe as input df dataframe, the following code will produce your desired output:

#dictionary assigning order of priority to status values
priority_map = {'B':1,'BW':2,'W':3,'NF':4}

#new temporary column that converts Status values to order of priority values
df['rank'] = df['Status'].map(priority_map)

#create dictionary with Item as key and lowest rank value per Item as value
lowest_val_dict = df.groupby('Item')['rank'].min().to_dict()

#new column that assigns the same Value to all rows per Item
df['Value'] = df['Item'].map(lowest_val_dict)

#replace Values where rank is different with 0's
df['Value'] = np.where(df['Value'] == df['rank'],1,0)

#delete rank column
del df['rank']

score 1 · Answer 2 · answered Apr 20 '19 at 12:11

1

I would prefer an approach where the status is an ordered pd.Categorical, because a) that's what it is and b) it's much more readable: if you have that, you just compare if a value is equal to the max of its group:

df['Status'] = pd.Categorical(df['Status'], categories=['NF', 'W', 'BW', 'B'],
                              ordered=True)
df['Value'] = df.groupby('Item')['Status'].apply(lambda x: (x == x.max()).astype(int))

#   Item Status  Value
#0   123      B      1
#1   123     BW      0
#2   123      W      0
#3   123     NF      0
#4   456      W      0
#5   456     BW      1
#6   789      W      1
#7   789     NF      0
#8     0     NF      1

answered Apr 20 '19 at 12:11

Jondiedoop

3,303
9
24

Last should be better by `df['Value'] = df.groupby('Item')['Status'].transform('max').eq(df['Status']).astype(int)` – jezrael Apr 20 '19 at 12:17
You could consider it better, since it is a tiny bit faster for very large dataframes, although I'd argue it is less readable. Anyway, good comment, intersting approach – Jondiedoop Apr 20 '19 at 17:50

score 0 · Answer 3 · edited Apr 20 '19 at 11:07

I might be able to help you conceptually, by explaining some steps that I would do:

Create the new column Value, and fill it with zeros np.zeros() or pd.fillna()
Group the dataframe by Item with groupby = pd.groupby('Item')
Iterate through all the groups founds for name, group in groupby:
By using a simple function with if's, a custom priority queue, custom sorting criteria, or any other preferred method, determine which entry has higher priority " by this value 1 is prioritized by this order: B, BW, W, NF ", and assign a value of 1 to it's Value column group.loc[entry]['Value'] == 0

Let's say we are looking at group '123':
```
 Item    Status    Value
 -------------------------
 123     B         0 (before 0, after 1)
 123     BW        0
 123     W         0
 123     NF        0
```
Because the row [123, 'B', 0] had the highest priority based on your criteria, you change it to [123, 'B', 1]
When finished, create the dataframe back from the groupby object, and you're done. You have a lot of possibilities for doing that, might check here: Converting a Pandas GroupBy object to DataFrame

Create binary column in pandas dataframe based on priority

3 Answers3