7

I am trying to use python regex on a URL string.

id= 'edu.vt.lib.scholar:http/ejournals/VALib/v48_n4/newsome.html'
>>> re.search('news|ejournals|theses',id).group()
'ejournals'
>>> re.findall('news|ejournals|theses',id)
['ejournals', 'news']

Based on the docs at http://docs.python.org/2/library/re.html#finding-all-adverbs, it says search() matches the first one and find all matches all the possible ones in the string.

I am wondering why 'news' is not captured with search even though it is declared first in the pattern.

Did i use the wrong pattern ? I want to search if any of those keywords occur in the string.

kich
  • 734
  • 2
  • 9
  • 23

4 Answers4

4

You're thinking about it backwards. The regex goes through the target string looking for "news" OR "ejournals" OR "theses" and returns the first one it finds. In this case "ejournals" appears first in the target string.

Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
3

The re.search() function stops after the first occurrence that satisfies your condition, not the first option in the pattern.

Nisan.H
  • 6,032
  • 2
  • 26
  • 26
0

Be aware that there are some other differences between search and findall which aren't stated here. For example:

python-regex why findall find nothing, but search works?

Aaron_ab
  • 3,450
  • 3
  • 28
  • 42
  • Please consider moving the content of the question into your answer. –  Aug 19 '17 at 15:33
0

`id= 'edu.vt.lib.scholar:http/ejournals/VALib/v48_n4/newsome.html'

re.search('news|ejournals|theses',id).group() 'ejournals'

re.search -> search for first appearance in string and then exit.

re.findall('news|ejournals|theses',id) ['ejournals', 'news']

re.findall -> search for all occurrences of match in string and return in list form.