Checking a column in a Pandas df does not contain certain text

Question

I am attempting to write a specific value to a column in a pandas df depending on if another column does or does not contain certain text. I have 3 outputs in the destination columns: Distributor, OEM and End User.

import pandas as pd
df = pd.read_excel("Customer Records.xlsx")

#Checking for distributor pricing tag
df.loc[df['CustomerRoles'].str.contains('Discount-Distributor-STD'), 'Customer Type'] = 'Distributor'
    
#Checking for OEM pricing tag
df.loc[df['CustomerRoles'].str.contains('Discount-OEM-STD'),         'Customer Type'] = 'OEM'

Those lines of code work as intended.

The last thing I need is for every other string that contains neither of those phrases to print "End User" in the 'Customer Type' column.

score 3 · Accepted Answer · answered May 22 '23 at 21:35

You can use np.select:

conds = [
    df['CustomerRoles'].str.contains('Discount-Distributor-STD'),
    df['CustomerRoles'].str.contains('Discount-OEM-STD')
]
choices = ['Distributor', 'OEM']

df['Customer Type'] = np.select(condlist=conds, choicelist=choices, default='End User')

score 1 · Answer 2 · answered May 22 '23 at 21:24

| means "or", and ~ flips the boolean, so False -> True. You can combine these together:

df.loc[~(df['CustomerRoles'].str.contains('Discount-Distributor-STD')\
         | df['CustomerRoles'].str.contains('Discount-OEM-STD')),
       "Customer Type"] = "End User"

There is a very good explanation of the different logical operators here by @cs95.

Checking a column in a Pandas df does not contain certain text

2 Answers2