I am still noob at using python and pandas. I am working to improve on a keyword assessment. My DF looks like this
Name Description
Dog Dogs are in the house
Cat Cats are in the shed
Cat Categories of cats are concatenated
I am using a keyword list like this ['house', 'shed', 'in']
My lambda function looks like this
keyword_agg = lambda x: ' ,'.join x if x is not 'skip me' else None
I am using a function to identify and score each row for keyword matches
def foo (df, words):
col_list = []
key_list= []
for w in words:
pattern = w
df[w] = np.where(df.Description.str.contains(pattern), 1, 0)
df[w +'keyword'] = np.where(df.Description.str.contains(pattern), w,
'skip me')
col_list.append(w)
key_list.append(w + 'keyword')
df['score'] = df[col_list].sum(axis=1)
df['keywords'] = df[key_list].apply(keyword_agg, axis=1)
The function appends the keyword to a column using the work and then creates a 1 or 0 based on the match. The function also creates a column with 'word + keyword' and creates the word or 'skip me' for each row.
I am expecting the apply to work like this
df['keywords'] = df[key_list].apply(keyword_agg, axis=1)
Returns
Keywords
in, house
in, shed
None
Instead I am getting
Keywords
in, 'skip me' , house
in, 'skip me', shed
'skip me', 'skip me' , 'skip me'
Can someone help me explain why the 'skip me' strings are showing when I am trying to exclude them?