1

I'm trying to match against *=, &=, +=, -=, |=, and ^= in a regular expression, but for some reason the below pattern accepts both <= and >=:

modifyPat = re.compile('\s*[&\|\*/%\+-^]*=[^=]*')

I've done some digging, and found that the problem arises due to the inclusion of the ^ character in the pattern. If, for example, I remove the ^ as in the below pattern, I get expected matching behavior, but of course lose the ability match against ^=:

modifyPat = re.compile('\s*[&\|\*/%\+-]*=[^=]*')

What is going on here, and is there any way to include the ^ character in order to match ^= without also matching <= and >= as I'd desire?

user3570982
  • 559
  • 1
  • 6
  • 14

1 Answers1

6

The way sets are defined has a lot of latitude, and a metasyntax of its own:

[+-^]

This means all the characters between + and ^ in the ASCII table, which is a lot of them and includes < and > as well as all letters.

To fix it:

[\+\-\^]

Escaping anything irregular is usually a good idea even if not strictly necessary.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • Thank you for the explanation. I almost never use character ranges, and for some reason forgot that the `-` was used for that purpose. – user3570982 Feb 15 '17 at 20:22
  • Regular expressions have a reputation for being notoriously tricky for precisely this reason. – tadman Feb 15 '17 at 20:28