1

Working with the dataframe df:

Product_ID | Category_A   | Category _B
1232             0              0 
1343             Unknown        X
2543             Nan            0 
2549             Y              Y
0349             X              X
8533             Y              X

I would like to create a new column Category_Final, with the following rules:

  • If Category_A is 0, Unknown or Nan, Category_Final should be "Unknown"
  • If Category_A is the Same as Category_B, Category_Final should be 0
  • If Category_A is different than Category_B,Category_Final should be X

Expected Output:

Product_ID | Category_A   | Category _B | Category_Final
1232             0              0            Unknown
1343             Unknown        X            Unknown
2543             Nan            0            Unknown
2549             Y              Y            0
0349             X              X            0
8533             Y              X            X

I managed to get the logic for 0 and X, but I don't know how to include the Unknown Logic.

df['Category_Final'] = np.where(df['Category_A'] != df['Category_B'], 'X', '0')

Thank you!

jeangelj
  • 4,338
  • 16
  • 54
  • 98
  • 1
    I think in these cases that numpy select gives the most readable solution: [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – JohnE Apr 05 '18 at 16:55

3 Answers3

2

After your current line, try this:

mask = ((df.Category_A.isnull()) | 
        (df.Category_A == 'Unknown') | 
        (df.Category_A == 0))
df.loc[mask, 'Category_Final'] = 'Unknown'
Peter Leimbigler
  • 10,775
  • 1
  • 23
  • 37
2

You can use nested np.where

df['Category_Final'] = np.where((df['Category_A'].isnull() | \
                                              (df['Category_A'] == 'Unknown') | (df['Category_A'] == '0')),\
                                              'Unknown', np.where(df['Category_A'] == \
                                                                  df['Category_B'], 0, 'X'))

Output

Product_ID  Category_A  Category_B  Category_Final
0   1232    0            0            Unknown
1   1343    Unknown      X            Unknown
2   2543    NaN          0            Unknown
3   2549    Y            Y              0
4   349     X            X              0
5   8533    Y            X              X
pythonic833
  • 3,054
  • 1
  • 12
  • 27
1
df['Category_Final'] = (
    df.apply(lambda _: "0", axis=1)
    .where(df['Category_A'] == df['Category_B'], "X")
    .where(~df['Category_A'].isin(["0", "Unknown", np.NaN]), "Unknown")
)
pomber
  • 23,132
  • 10
  • 81
  • 94