I have to assign values to the 'group' column of the Pandas DataFrame based on the substring from another column. Example DataFrame:
import pandas as pd
groups = ['custumer', 'supplier', 'irrelevant', 'spam', 'invoice', 'shipping advice']
df = pd.DataFrame({
'mailLabels': ['customers/AcmeBar', 'suppliers/AcmeBaz', 'irrelevant', 'spam', 'invoice', 'shipping advice' ],
'group': ['na', 'na', 'na', 'na', 'na', 'na']})
My solution works but it is extremely cumbersome as the number of groups is much bigger than in this example:
df['group'] = pd.np.where(df.mailLabels.str.contains("customer"), "sales",
pd.np.where(df.mailLabels.str.contains("supplier"), "procurement",
pd.np.where(df.mailLabels.str.contains("irrelevant"), "not important",
pd.np.where(df.mailLabels.str.contains("spam"), "not important", "other"))))
print(df)
mailLabels group
0 customers/AcmeBar sales
1 suppliers/AcmeBaz procurement
2 irrelevant not important
3 spam not important
4 invoice other
5 shipping advice other
Is there a vectorised solution to this problem? This one does not work as I cannot split mailLabels column due to a messy data.