3

I know this question has been asked before but earlier today I found the following code in SO:

import re   

def findIfWordMatch(w):
    return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search

For example if I use the following strings the function return a match object if text1 is found in text2 (otherwise the function return None):

text1 = 'DIBUJO'
text2 = 'DIBUJO B308'

So to know if text1 is in text2 I do the following:

if(findIfWordMatch(text1)(text2) is not None):  
    #doSomething()

And it has been working well until I used these variables:

text1 = 'INT.EST.C.S.'
text2 = 'INT.EST.C.S. B308'

Im almost sure it has nothing to do with the dots because I have other variables with a similar structure and in works just fine so..

I would like to know why is this happening or another way to find if a string is inside another.

Thanks in advice

  • 1
    Actually you do not even need regex for that since there is a native operator `in` that does exactly what you are trying to implement... https://stackoverflow.com/questions/3437059/does-python-have-a-string-contains-substring-method Don't reinvent the wheel bro! ;-) – Allan Feb 26 '18 at 04:42

3 Answers3

6
'INT.EST.C.S. B308'
            ^^

Together dot and space '. ', in regex equivalent to \W\W, are not considered a part of word boundary \b (^\w|\w$|\W\w|\w\W). Use negative lockahead (?<!)(?!).

Regex: (?<!\S){0}(?!\S)

Srdjan M.
  • 3,310
  • 3
  • 13
  • 34
1

Try this instead.

text1 = 'INT.EST.C.S.'
text2 = 'INT.EST.C.S. B308'

if text1 in text2:
  print("yes!")
Jony Karki
  • 36
  • 4
0

The dot '.' is used to match any character. So your pattern matches more than it should.

You can either format your pattern correctly by escaping the dots.

text1 = r'INT\.EST\.C\.S\.'

Or since this is a simple pattern you can use in to check if text1 is contained in text2

if text1 in text2:  
    #doSomething()
Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
  • That is a hack and will work in this specific case only. How does one escape the dots automatically according to the value in the variable `w` in `findIfWordMatch()`. – Shashank Singh Feb 26 '18 at 04:26
  • This is not a hack, it is simply formatting your pattern correctly. `findIfWordMatch` should only be used if a pattern has to be matched, thus the user should know they have to provide a pattern. If it is not, then `in` should be used. – Olivier Melançon Feb 26 '18 at 04:29