-2

I have to split a loooong filename with several delimiters (e.g. "AA_BB1234_CC123456789_2020-01-31_001.xml) into bits in order to create a new name from it, using the date-part and the current filenumber at the end. By pure chance I found out that

data = re.split(r'[_-.], filename) 

throws a "bad character range" error, but if I change the order to hyphen, underscore, dot, it works just fine:

data = re.split(r'[-_.]', filename)

Why is that?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Dirk
  • 1
  • 2
  • That's because hyphen needs to be scaped, you can use it for `a-zA-Z0-9`, so you need to add a backslash to escape it `a-zA-Z0-9\-` – Triby Jan 31 '20 at 19:08

1 Answers1

-1

The hyphen is a special character when it appears within a character set, but only when it's between two other characters. In this situation, it defines a range for the set. When it's placed as the rightmost or leftmost character, it's treated as a literal hyphen.

[a-dz]: Matches a, b, c, d, or z

[-adz]: Matches -, a, d, or z

[adz-]: Matches -, a, d, or z

There's an exception to this when using a negated character set, as the ^ itself is also a special character:

[^-adz]: Matches a character other than -, a, d, or z (not a range between ^ and a)

CAustin
  • 4,525
  • 13
  • 25