For some reason, I don't know why, maybe something isn't quite right in my system or in my brain, the regular expression "[A-Z]" doesn't seem to recognise the letter ”W” and "[a-z]" doesn't seem to recognise the letter ”w”. Example:
for x in A a B b C c D d E e F f G g H h I i J j K k L l M m N n O o P p Q q R r S s T t U u V v W w X x Y y Z z; do echo $x | egrep "[A-Za-z]"; done
My output is: A a B b C c D d E e F f G g H h I i J j K k L l M m N n O o P p Q q R r S s T t U u V v X x Y y Z z
As you can see, letters ”W” and ”w” are both missing. Am I the only one? What could possibly cause this? If it's a bug, where do I report it? This happens in bash and zsh and it happens in sed and egrep (and possibly more, I only tested those two), so the problem seems to be about regular expressions in general… :o So… what is going on??
- Manjaro 17.1.12
- XFCE 4.12
- bash 4.4.23(1)-release (x86_64-unknown-linux-gnu)
- zsh 5.5.1 (x86_64-unknown-linux-gnu)
- egrep 3.1
- sed 4.5
Edit: Someone asked for my locale, so here it is.
$ locale
LANG=sv_SE.utf8
LC_CTYPE="sv_SE.utf8"
LC_NUMERIC=sv_SE.UTF-8
LC_TIME=sv_SE.UTF-8
LC_COLLATE="sv_SE.utf8"
LC_MONETARY=sv_SE.UTF-8
LC_MESSAGES="sv_SE.utf8"
LC_PAPER=sv_SE.UTF-8
LC_NAME=sv_SE.UTF-8
LC_ADDRESS=sv_SE.UTF-8
LC_TELEPHONE=sv_SE.UTF-8
LC_MEASUREMENT=sv_SE.UTF-8
LC_IDENTIFICATION=sv_SE.UTF-8
LC_ALL=
If this is the problem, then I guess whatever decides what sv_SE.UTF-8 is, is wrong, because the letter ”w” was added to the Swedish alphabet in 2006. Also, if the A-Z interval is dependent on the current locale, shouldn't [A-Ö] work for the whole Swedish alphabet when locale is set to Swedish? It doesn't, it gives an error message. However [[:alpha:]] seems to include all Swedish letters, so I guess I'm happy with that.