2

This does not give me an error nor an answer.

re.sub('\\.(\\W|\\.)*[o0](\\W|[o0])*', '*', '..........................................')  

Why does it behave like so? Also, if I reduce the amount of 'periods', then it works.

Thank you.

Squall Leohart
  • 657
  • 2
  • 8
  • 20

2 Answers2

8

You've got catastrophic backtracking.

Katriel
  • 120,462
  • 19
  • 136
  • 170
5

You have no o or 0 in your input string, yet your regular expression requires at least one of those characters to be there ([o0]).

>>> re.compile('\\.(\\W|\\.)*[o0](\\W|[o0])*', re.DEBUG)
literal 46
max_repeat 0 65535
  subpattern 1
    branch
      in
        category category_not_word
    or
      literal 46
in
  literal 111
  literal 48
max_repeat 0 65535
  subpattern 2
    branch
      in
        category category_not_word
    or
      in
        literal 111
        literal 48

Update: Your regular expression is suffering from catastrophic backtracking; avoid the nested character-class-or-character-set combination in a group with a wildcard (the branch .. or parts inside a max_repeat listed above). You can put character classes inside a character set to avoid this.

Also note, that you can use the r'' raw string notation to avoid all the escaped backslashes.

The following works:

re.sub(r'\.[\W\.]*[o0][\Wo0]*', '*', '..........................................')

because it compiles to:

>>> re.compile(r'\.[\W\.]*[o0][\Wo0]*', re.DEBUG)
literal 46
max_repeat 0 65535
  in
    category category_not_word
    literal 46
in
  literal 111
  literal 48
max_repeat 0 65535
  in
    category category_not_word
    literal 111
    literal 48

Note that now the branches are gone.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I understand, and I do not expect the string to be matched. But why does it not give me an answer? (it seems like it's running an infinite loop, or gets stuck or something) – Squall Leohart Aug 18 '12 at 01:30
  • @SquallLeohart: that's an important detail; catastrophic backtracking will do that to you.. – Martijn Pieters Aug 18 '12 at 07:45