0

I am new to RegEx in python. I have created a RegEx formula which should find some special string from text but it is not working as exprected;

def find_short_url(str_field):
    search_string = r"moourl.com|ow.ly|goo.gl|polr.me|su.pr|bit.ly|is.gd|tinyurl.com|buff.ly|bit.do|adf.ly"
    search_string = re.search(search_string, str(str_field))
    result = search_string.group(0) if search_string else None
    return result

It should find all the URL shortner from a text. But the su.pr is detecting as surpr from the text. Is there any way to fix it?

find_short_url("It is a surprise that it is ...")

output

'surpr'

It can affect other shortner too. Still scratching my head.

DataPsycho
  • 958
  • 1
  • 8
  • 28

1 Answers1

0

Escape the dots:

search_string = r"moourl\.com|ow\.ly|goo\.gl|polr\.me|su\.pr|bit\.ly|is\.gd|tinyurl\.com|buff\.ly|bit\.do|adf\.ly"

In regex, a dot matches any character. Escaping them makes them match a literal dot.

Bohemian
  • 412,405
  • 93
  • 575
  • 722