5

If I have a regex that is [0-Z] or [a-Z] - what characters would it match? Is it valid regex? Can you have ranges in regex outside of 0-9, a-z and A-Z?

Billy Moon
  • 57,113
  • 24
  • 136
  • 237

4 Answers4

3

Yes, you can have other ranges. From MSDN - Character Classes in Regular Expressions (bold is mine):

The syntax for specifying a range of characters is as follows:

[firstCharacter-lastCharacter]

where firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (-), and then the last character in the series. Two characters are contiguous if they have adjacent Unicode code points.

So, in the end, [0-Z] will match 0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ. You can check the ASCII table for 0-Z.

As for [a-Z], as they don't specify a contiguous series, they should match nothing.

Just keep in mind, for the general rule, the effect can be wide: Unicode character codes, not just ASCII - ultimately, of course, it depends on the implementation, so, if in doubt, check it.

acdcjunior
  • 132,397
  • 37
  • 331
  • 304
  • Whether it's ASCII or Unicode depends on the regex implementation. There is more than just the Microsoft regex engine. ;) – Amber Oct 16 '13 at 17:25
  • @Amber, you are correct. I added a note. Depending on the tool, it is worth checking the support. Thanks. – acdcjunior Oct 16 '13 at 17:33
3

The range [0-Z] is valid, depending on the regex engine [a-Z] will either be invalid or it will be a range that can't match any characters. In a character class range the start and end characters are just code points and all characters between those code points will be included in the range.

In the case of [0-Z], this is equivalent to the following more readable character class:

[0-9:;<=>?@A-Z]

In the case of [a-Z], this is actually a character class that won't match anything because a has a higher code point than Z.

You can see the code points in the following ASCII table from http://www.asciitable.com/:

enter image description here

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • I can't imagine `[a-Z]+` would compile. It would sit in an loop forever, or be regected by the optimizer and not even match an empty string. –  Oct 16 '13 at 18:21
1

Ranges depend on the character's (unicode) value. A range from [0-9] makes sense, but a range from [9-0] does not. Likewise, a range from [a-Z] will be empty because 'a' is greater than 'Z'. (All the uppercase letters come first, and there are intervening characters between 'Z' and 'a'). Rely on a table of character values (pull up charmap on Windows), and don't get fancy.

fred02138
  • 3,323
  • 1
  • 14
  • 17
  • Actually, that depends on the regex implementation. Not all regex engines respect Unicode; many are ASCII-only. – Amber Oct 16 '13 at 17:25
  • @Amber - thanks, I stand corrected. – fred02138 Oct 16 '13 at 17:26
  • `a range from [a-Z] will be empty` most engines throw errors like `invalid range`, or in the case of `[]`, empty class. Class contents are parsed in as isolated units, valid ranges are a unit, hex/octal and properties are units too. Everything else is a single char unit. So this `[a-z-A-Z]` is equivalent to the more known `[a-zA-Z-]` (identical to [a-r-X-Z-a-z-A-Z-], although cryptic). –  Oct 16 '13 at 18:07
1

You can create any range as long as the order of the characters' unicode value is lower to higher. Take ascii for example. a is higher in order than Z, so the range a-Z is invalid. The range A-z is valid, but you should note that this includes non-letter characters like ^ and [. 0-Z is also valid and includes :, ?, and a whole bunch of other characters you probably don't want.

To answer your question, you can create any range in the right order. It may not be useful to use something like A-z, but something like a-d is pretty common.

Regex engines may react differently to ranges that are out of order or otherwise invalid.

Explosion Pills
  • 188,624
  • 52
  • 326
  • 405