0

I have to assign values to the 'group' column of the Pandas DataFrame based on the substring from another column. Example DataFrame:

import pandas as pd

groups = ['custumer', 'supplier', 'irrelevant', 'spam', 'invoice', 'shipping advice']

df = pd.DataFrame({
    'mailLabels': ['customers/AcmeBar', 'suppliers/AcmeBaz', 'irrelevant', 'spam', 'invoice', 'shipping advice' ],
    'group': ['na', 'na', 'na', 'na', 'na', 'na']})

My solution works but it is extremely cumbersome as the number of groups is much bigger than in this example:

df['group'] = pd.np.where(df.mailLabels.str.contains("customer"), "sales",
                               pd.np.where(df.mailLabels.str.contains("supplier"), "procurement",
                               pd.np.where(df.mailLabels.str.contains("irrelevant"), "not important",
                               pd.np.where(df.mailLabels.str.contains("spam"), "not important", "other"))))

print(df)

          mailLabels          group
0  customers/AcmeBar          sales
1  suppliers/AcmeBaz    procurement
2         irrelevant  not important
3               spam  not important
4            invoice          other
5    shipping advice          other

Is there a vectorised solution to this problem? This one does not work as I cannot split mailLabels column due to a messy data.

andrej
  • 321
  • 1
  • 4
  • 13
  • 1
    see `np.select`. – ansev Dec 30 '19 at 18:58
  • how many groups do you have? Are the groups mapped a key in a dict? – It_is_Chris Dec 30 '19 at 19:03
  • I have 31 groups in a list but I could map them in a dict if this would help. – andrej Dec 30 '19 at 20:27
  • @YO and BEN_W I checked [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) but I could not find any reference to my real problem, matching substring. – andrej Dec 30 '19 at 20:52

0 Answers0