You could apply a lambda function on "classification" that looks checks if an item in it exists as a column name:
cols = ['apple','banana','peach','grape']
df[cols] = df['classification'].apply(lambda x: [1 if col in x else 0 for col in cols]).tolist()
Another option is to explode
+ stack
+ fillna
to get a blank Series where the MultiIndex consists of the index, "classification" and column names of df
. Then evaluate if any item in "classification" exists as a column name, create a Series, unstack
+ groupby
+ sum
to build a DataFrame to assign back to df
:
tmp = df.explode('classification')
s = tmp.set_index([tmp.index, tmp['classification']])[cols].fillna(0).stack()
s = pd.Series((s.index.get_level_values(1)==s.index.get_level_values(2)).astype(int), index=s.index)
df[cols] = s.unstack().groupby(level=0).sum()
Yet even simpler is to use explode
+ pd.get_dummies
+ groupby
+ sum
to get the items in "classification" as dummy variables, then update df
with it using fillna
:
df[cols] = df[cols].fillna(pd.get_dummies(df['classification'].explode()).groupby(level=0).sum()).fillna(0)
Output:
classification text apple banana peach grape
0 [apple, grape] anytext 1 0 0 1