1

I tried some regex matching in Python. For example:

s = "aaabbb123123"
print(re.search("[ab]*", s))

output : <re.Match object; span=(0, 6), match='aaabbb'> --> Ok, it's good.
s = "aaabbb123123"
print(re.search("[.]*", s))

output : <re.Match object; span=(0, 0), match=''> --> why not "aaabbb123123"?
s = "aaabbb123123"
print(re.search("[123]*", s))

output : <re.Match object; span=(0, 0), match=''> --> why not "123123"?

My question is why the pattern "[anything]*" doesn't work if the matched string is not at the starting position of the target string.

  • 3
    The matches are succeeding, but the `*` means that an empty string is a permissible match. Use `+` instead? – CertainPerformance Nov 04 '19 at 01:38
  • 4
    The first/leftmost match (fundamental part of matching) is more important than the longest match (behaviour of individual quantifiers). Also, `[.]` is a literal period, different from `.`. Are you trying to get the longest match anywhere? Or just any non-empty match? – Ry- Nov 04 '19 at 01:38
  • Greediness means individual phases of the matching process attempt to match as much as possible, not that the search attempts to find the longest possible match for the overall regex. – user2357112 Nov 04 '19 at 01:41
  • Also `.` isn't special inside a character class. – user2357112 Nov 04 '19 at 01:42
  • In your second and third cases you are looking for 0 or more of a period (case 2) or the numbers 1, 2, or 3 (case 3). In both cases that matches at the beginning of the string because there are 0 periods and 0 of 1,2 or 3. Your first case gives a longer result because the first character is an a, so the regex keeps matching until it can't any more. – Nick Nov 04 '19 at 01:43
  • Your second case should actually be `.*` which will match the entire string (`.` inside a character class matches a literal `.`, not any character). For your third case, using a `+` forces the regex to look for a 1, 2 or 3, and having found one, it will then keep matching until there are no more of those characters. – Nick Nov 04 '19 at 01:45
  • I got it. I didn't notice leftmost is prior to longest, and [.] means a set containing a char '.'. Thank you guys. – tanchihpin0517 Nov 04 '19 at 01:52

0 Answers0