3

I have a pandas dataframe that looks something like this:

Item    Status
123     B
123     BW
123     W 
123     NF
456     W
456     BW
789     W
789     NF
000     NF

And I need to create a new column Value which will be either 1 or 0 depending on the values in the Item and Status columns. The assignment of the value 1 is prioritized by this order: B, BW, W, NF. So, using the sample dataframe above, the result should be:

Item    Status    Value
123     B         1
123     BW        0
123     W         0
123     NF        0
456     W         0
456     BW        1
789     W         1
789     NF        0
000     NF        1

Using Python 3.7.

erik7970
  • 693
  • 1
  • 8
  • 21
  • How does the `Item` column play a role in this? Do you assign only one `1` value per `Item`, based on the order of priority in `Status`? – AlexK Apr 14 '19 at 21:28
  • If there is only one instance of a certain item in the `Item` column, it should be `1` by default. However, when there is more than one occurrence of an item, then, based on the `Status` and the priority, there should be a corresponding value of either `1` or `0`. Hope that makes sense. – erik7970 Apr 14 '19 at 21:34

3 Answers3

2

Taking your original dataframe as input df dataframe, the following code will produce your desired output:

#dictionary assigning order of priority to status values
priority_map = {'B':1,'BW':2,'W':3,'NF':4}

#new temporary column that converts Status values to order of priority values
df['rank'] = df['Status'].map(priority_map)

#create dictionary with Item as key and lowest rank value per Item as value
lowest_val_dict = df.groupby('Item')['rank'].min().to_dict()

#new column that assigns the same Value to all rows per Item
df['Value'] = df['Item'].map(lowest_val_dict)

#replace Values where rank is different with 0's
df['Value'] = np.where(df['Value'] == df['rank'],1,0)

#delete rank column
del df['rank']
AlexK
  • 2,855
  • 9
  • 16
  • 27
1

I would prefer an approach where the status is an ordered pd.Categorical, because a) that's what it is and b) it's much more readable: if you have that, you just compare if a value is equal to the max of its group:

df['Status'] = pd.Categorical(df['Status'], categories=['NF', 'W', 'BW', 'B'],
                              ordered=True)
df['Value'] = df.groupby('Item')['Status'].apply(lambda x: (x == x.max()).astype(int))

#   Item Status  Value
#0   123      B      1
#1   123     BW      0
#2   123      W      0
#3   123     NF      0
#4   456      W      0
#5   456     BW      1
#6   789      W      1
#7   789     NF      0
#8     0     NF      1
Jondiedoop
  • 3,303
  • 9
  • 24
  • Last should be better by `df['Value'] = df.groupby('Item')['Status'].transform('max').eq(df['Status']).astype(int)` – jezrael Apr 20 '19 at 12:17
  • You could consider it better, since it is a tiny bit faster for very large dataframes, although I'd argue it is less readable. Anyway, good comment, intersting approach – Jondiedoop Apr 20 '19 at 17:50
0

I might be able to help you conceptually, by explaining some steps that I would do:

  1. Create the new column Value, and fill it with zeros np.zeros() or pd.fillna()
  2. Group the dataframe by Item with groupby = pd.groupby('Item')
  3. Iterate through all the groups founds for name, group in groupby:
  4. By using a simple function with if's, a custom priority queue, custom sorting criteria, or any other preferred method, determine which entry has higher priority " by this value 1 is prioritized by this order: B, BW, W, NF ", and assign a value of 1 to it's Value column group.loc[entry]['Value'] == 0

    Let's say we are looking at group '123':

     Item    Status    Value
     -------------------------
     123     B         0 (before 0, after 1)
     123     BW        0
     123     W         0
     123     NF        0
    

    Because the row [123, 'B', 0] had the highest priority based on your criteria, you change it to [123, 'B', 1]

  5. When finished, create the dataframe back from the groupby object, and you're done. You have a lot of possibilities for doing that, might check here: Converting a Pandas GroupBy object to DataFrame

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
0'mihai'
  • 11
  • 2