2

I need to write a strict regular expression to replace certain values in my pandas dataframe. This is an issue that was raised after solving the question that I posted here.

The issue is that the .replace(idsToReplace, regex=True) is not strict. Therefore if the iDsToReplace are:

NY : New York
NYC : New York City

and the comment in which we are replacing the ID is:

My cat from NYC is large.

The resulting response is:

My cat from New York is large.

Is there a pythonic way within the pandas replace function to make the regular expression stricter to match with NYC and not NY?

owwoow14
  • 1,694
  • 8
  • 28
  • 43
  • there's no concept of strictness in regex, it just matches what you tell it to. You may be looking for `\b` word boundaries. – Aaron Sep 21 '17 at 13:53
  • Sorry, do you need replace `My cat from NYC is large.` to `My cat from New York City is large.` if dict is `d = {'NYC': 'New York City', 'NY' : 'New York'}` ? – jezrael Sep 21 '17 at 14:27
  • The issue was that the Word NYC was being captured by NY, instead of NYC. Thus the correct answer is: 'My cat from New York City is large'. I am doing some tests, but so far it seems as if your below answer is working with the `bounds` – owwoow14 Sep 21 '17 at 14:32
  • @owwoow14 - Super, glad can help! – jezrael Sep 22 '17 at 05:17

1 Answers1

2

Add \b for word boundaries to each key of dict:

d = {'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City', 'NY' : 'New York'}

data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
                    'The rock was found in LA.']
}

d = {r'\b' + k + r'\b':v for k, v in d.items()}

df = pd.DataFrame(data)

df['commentTest'] = df['Comment'].replace(d, regex=True)
print (df)
  Categories                          Comment  Type  \
0     animal         The NYC tree is very big  tree   
1      plant  NY The cat from the UK is small   dog   
2     object        The rock was found in LA.  rock   

                                         commentTest  
0                 The New York City tree is very big  
1  New York The cat from the United Kingdom is small  
2                 The rock was found in Los Angeles.  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252