Is the regular expression [a-Z] valid and if yes then is it the same as [a-zA-Z]?

Question

Is the regular expression [a-Z] valid and if yes then is it the same as [a-zA-Z]? Please note that in [a-Z] the a is lowercase and the Z is uppercase.

Edit:

I received some answers specifiying that while [a-Z] is not valid then [A-z] is valid (but won't be the same as [a-zA-Z]) and this is really what I was looking for. Since I wanted to know in general if it's possible to replace [a-zA-Z] with a more compact version.

Thanks for all who contributed to the answer.

John Kugelman · Accepted Answer · 2009-11-02T00:14:18.627

35

No, a (97) is higher than Z (90). [a-Z] isn't a valid character class. However [A-z] wouldn't be equivalent either, but for a different reason. It would cover all the letters but would also include the characters between the uppercase and lowercase letters: [\]^_`.

edited Nov 02 '09 at 00:14

answered Nov 02 '09 at 00:07

John Kugelman

349,597
67
533
578

Yes it is... `[a-Z]` is invalid because `Z` comes before `a` – gnarf Nov 02 '09 at 00:14
3

I explained why both `[a-Z]` and `[A-z]` are invalid. Don't downvote me for doing extra credit. :-) – John Kugelman Nov 02 '09 at 00:19
1

I am unsure whether regexes are only specified for ASCII. Couldn't this also be dependent on the encoding and collation? – Svante Nov 02 '09 at 07:15
[a-Z] is invalid in the C locale, yes. In that locale, the numeric value of the encoded character is the order. But that does not apply to many other locales (for example en_US.utf8). In that locale, [a-Z] represents an existing collation order and therefore is valid. Furthermore, it represents all the upper and lower letters in the ASCII range. – May 16 '18 at 18:08
Easily the best answer here with reference to the op's question. Perhaps it is also worth adding in suggestions about making the regular expression case insensitive to aid readability for anything more complex than these simple examples, if possible in the library/language variant in use (e.g. `/[a-z]/i` or `(?i)[a-z]`) – David Long Jul 12 '18 at 13:55

score 4 · Answer 2 · edited Jan 04 '11 at 00:31

4

I'm not sure about other languages' implementations, but in PHP you can do

"/[a-z]/i"

and it will case insensitive. There is probably something similar for other languages.

edited Jan 04 '11 at 00:31

Peter Mortensen

30,738
21
105
131

answered Nov 02 '09 at 00:05

helloandre

10,541
8
47
64

Most of PHP's features come from Perl, including this one. (PHP used to be written in Perl. Actually one of the P's used to stand for Perl) – Brad Gilbert Nov 02 '09 at 01:33

score 3 · Answer 3 · answered Nov 02 '09 at 00:09

You don't specify what language, but in general [a-Z] won't be a valid range, as in ASCII the lower-case alpha characters come after the upper-case ones. [A-z] might be a valid range (indicating all upper- and lower-cased alphas as well as the punctuation that appears between Z and a), but it might not be, depending on your particular implementation. The i flag can be added to the regex to make it case-insensitive; check your particular implementation for instructions on how to specify that flag.

score 2 · Answer 4 · edited Oct 17 '12 at 16:45

I've just fallen over this in a script (not my own).

It seems that grep, awk, sed accept [a-Z] based on your locale (i.e. LANG or LC_CTYPE environment variable). In POSIX, [a-Z] isn't allowed by these tools, but in some other locales (e.g. en_gb.utf8) it works, and is the same as [a-zA-Z].

Yes, I've checked, it doesn't match any of _^[]`.

Given that this has taken quite some time to debug, I strongly discourage anyone from ever using [a-Z] in a regex.

score 2 · Answer 5 · edited Nov 02 '09 at 00:05

2

You could always try it:

 print "ok" if "monkey" =~ /[a-Z]/;

Perl says

Invalid [] range "a-Z" in regex; marked by <-- HERE in m/[a-Z <-- HERE ]/ at a-z.pl line 4.

edited Nov 02 '09 at 00:05

Jeff Atwood

63,320
48
150
153

answered Nov 02 '09 at 00:04

2

Exactly what I said. My favorite saying is "try it 'n c" because if you happen to be developing in C at the time it has two meanings. – Robert Massaioli Nov 02 '09 at 00:07
3

I don't like "try it and see" because if he had tried `[A-z]` there'd be no error message but it wouldn't work right either. – John Kugelman Nov 02 '09 at 00:09
This is because in ASCII, uppercase comes first. So, [A-z] is valid, but [a-Z] is not. – jheddings Nov 02 '09 at 00:09
But he's not asking that question. The question is very clear. Why are you deliberately misinterpreting it? – Nov 02 '09 at 00:18

score 2 · Answer 6 · answered Nov 02 '09 at 00:12

If it's valid, it won't do what you expect.

The character code of Z is lower than the character code of a, so if the codes are swapped to mean the range [Z-a], it will be the same as [Z\[\\\]^_`a], i.e. it will include the characters Z and a, and the characters between.

If you use [A-z] to get all upper and lower case characters, that is still not the same as [A-Za-z], it's the same as [A-Z\[\\\]^_`a-z].

score 1 · Answer 7 · edited Jan 04 '11 at 00:29

1

No, it's not valid, probably because the ASCII values are not consecutive from z to A.

edited Jan 04 '11 at 00:29

Peter Mortensen

30,738
21
105
131

answered Nov 02 '09 at 00:06

ennuikiller

46,381
14
112
137

Is the regular expression [a-Z] valid and if yes then is it the same as [a-zA-Z]?

7 Answers7

Linked

Related