0

I was doing some regex which simplifies to this code:

>>> import re
>>> re.sub(r'^.*$|', "xyz", "abc")
xyzxyz

I was expecting it to replace abc with xyz as the RE ^.*$ matches the whole string, the engine should just return that and exit. So I ran the same regex with re.findall().

>>> re.findall(r'^.*$|', 'abcd')
['abcd', '']

in the docs it says:

A|B, where A and B can be arbitrary REs. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match.

but than why is the regex matching an empty string?

phokat
  • 1
  • 1
  • 1
  • Your pattern (equal to `(?:^.*$)?`) matches an empty string, so it matches what it matches and an empty string at the end of the string. – Wiktor Stribiżew Nov 19 '20 at 13:19
  • @WiktorStribiżew but `$` is the end of string, so there shouldn't be any empty string to match if everything has already been matched – phokat Nov 19 '20 at 13:37
  • No, it does not matter if there is `$` or not since your pattern is just optional. It is simply matches all the string from start to end, or the end of the string. So, when running `re.findall` that returns all matches, you get two. As usual, if you expect just one, use `re.search`. – Wiktor Stribiżew Nov 19 '20 at 13:39

0 Answers0