Create Pandas DataFrame column based on substring values of another column

Question

I have to assign values to the 'group' column of the Pandas DataFrame based on the substring from another column. Example DataFrame:

import pandas as pd

groups = ['custumer', 'supplier', 'irrelevant', 'spam', 'invoice', 'shipping advice']

df = pd.DataFrame({
    'mailLabels': ['customers/AcmeBar', 'suppliers/AcmeBaz', 'irrelevant', 'spam', 'invoice', 'shipping advice' ],
    'group': ['na', 'na', 'na', 'na', 'na', 'na']})

My solution works but it is extremely cumbersome as the number of groups is much bigger than in this example:

df['group'] = pd.np.where(df.mailLabels.str.contains("customer"), "sales",
                               pd.np.where(df.mailLabels.str.contains("supplier"), "procurement",
                               pd.np.where(df.mailLabels.str.contains("irrelevant"), "not important",
                               pd.np.where(df.mailLabels.str.contains("spam"), "not important", "other"))))

print(df)

          mailLabels          group
0  customers/AcmeBar          sales
1  suppliers/AcmeBaz    procurement
2         irrelevant  not important
3               spam  not important
4            invoice          other
5    shipping advice          other

Is there a vectorised solution to this problem? This one does not work as I cannot split mailLabels column due to a messy data.

how many groups do you have? Are the groups mapped a key in a dict? — It_is_Chris, Dec 30 '19 at 19:03
I have 31 groups in a list but I could map them in a dict if this would help. — andrej, Dec 30 '19 at 20:27
@YO and BEN_W I checked [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) but I could not find any reference to my real problem, matching substring. — andrej, Dec 30 '19 at 20:52

Create Pandas DataFrame column based on substring values of another column

0 Answers0