1

I'm trying to parse a link out of some content using regex. I've already got success but I had to use replace() function and this as a flag. The thing is this may not always be present there. So, I seek any solution to get the same output without those two things I've mentioned already.

import re

content = """
widgetEvCall('handlers.onMenuClicked', event, this, 'http://www.stirwen.be/medias/documents/20181002_carte_octobre-novembre_2018_FR.pdf')
"""
link = re.findall(r'this,\s*([^)]*)',content.strip())[0].replace("'","")
print(link)

Output:

http://www.stirwen.be/medias/documents/20181002_carte_octobre-novembre_2018_FR.pdf

How can I get the link using pure regex?

MITHU
  • 113
  • 3
  • 12
  • 41
  • this post about finding url with regex might be what you want: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string – Inspi Jul 31 '19 at 19:20

1 Answers1

2

You may extract all chars between single quotes after this, and spaces:

import re

content = """
widgetEvCall('handlers.onMenuClicked', event, this, 'http://w...content-available-to-author-only...n.be/medias/documents/20181002_carte_octobre-novembre_2018_FR.pdf')
"""
link = ''
m = re.search(r"this,\s*'([^']*)'", content)
if m:
    link = m.group(1)

print(link)
# => http://www.stirwen.be/medias/documents/20181002_carte_octobre-novembre_2018_FR.pdf

See the Python demo

Also, see the regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Yes, it did it @Wiktor Stribiżew. A quick little question- can I not use `(.*)` instead of `([^']*)` as they both do the same thing here? Thanks. – MITHU Jul 31 '19 at 19:29
  • @MITHU It depends on what the actual rules are and the strings you will use to extract data from. I see that the value needed is inside single quotation marks, and there are no other single quotation marks are expected, so, `[^']*` seems suitable. It is also more efficient than `.*` that matches any 0+ chars other than line break chars as many as possible, and after grabbing the whole line starts backtracking, getting to the place before `'` that is later consumed with `'` pattern part.Negated character class is the best choice to grab a substring from `a` to `a` when no other `a` is expected. – Wiktor Stribiżew Jul 31 '19 at 19:59