0

I am still noob at using python and pandas. I am working to improve on a keyword assessment. My DF looks like this

Name  Description 
Dog   Dogs are in the house
Cat   Cats are in the shed
Cat   Categories of cats are concatenated

I am using a keyword list like this ['house', 'shed', 'in']

My lambda function looks like this

keyword_agg = lambda x: ' ,'.join x if x is not 'skip me' else None

I am using a function to identify and score each row for keyword matches

def foo (df, words):
    col_list = []
    key_list= []
    for w in words:
        pattern = w
        df[w] = np.where(df.Description.str.contains(pattern), 1, 0)
        df[w +'keyword'] = np.where(df.Description.str.contains(pattern), w, 
                          'skip me')
        col_list.append(w)
        key_list.append(w + 'keyword')
    df['score'] = df[col_list].sum(axis=1)
    df['keywords'] = df[key_list].apply(keyword_agg, axis=1)

The function appends the keyword to a column using the work and then creates a 1 or 0 based on the match. The function also creates a column with 'word + keyword' and creates the word or 'skip me' for each row.

I am expecting the apply to work like this

df['keywords'] = df[key_list].apply(keyword_agg, axis=1)

Returns

Keywords
in, house
in, shed
None

Instead I am getting

Keywords
in, 'skip me' , house
in, 'skip me', shed
'skip me', 'skip me' , 'skip me'

Can someone help me explain why the 'skip me' strings are showing when I am trying to exclude them?

DataNoob
  • 341
  • 2
  • 5
  • 13
  • 2
    `is not` is identity. you want `x != "skip me"` See [Why does comparing strings in Python using either '==' or 'is' sometimes produce a different result?](https://stackoverflow.com/questions/1504717/why-does-comparing-strings-in-python-using-either-or-is-sometimes-produce) – TemporalWolf Jul 17 '17 at 21:07
  • First of all, why are you using `lambda` at all? You are assigning it to a name, thereby removing the *only advantage that `lambda` has*: that it is anonymous. Second, I'm pretty sure `keyword_agg = lambda x: ' ,'.join x if x is not 'skip me' else None` is a SyntaxError. – juanpa.arrivillaga Jul 17 '17 at 21:20

1 Answers1

6

The is operator (and the is not) check reference equality.

You should use the equality operator which will for most primitives checks value equality:

lambda x: ' ,'.join(x) if x != 'skip me' else None
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555