I call myself as POSIX shell wizard. But today I have shat into my pants.
So here is nothing strange:
bash# printf 'v10\nv1.' | sort
v1.
v10
because .
has code 0x2e
and 0
has code 0x30
. But how about this:
bash# printf 'v101\nv1.1' | sort
v101
v1.1
WTF? Ok, I am wizard:
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME=en_DK.utf8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
So:
bash# printf 'v101\nv1.1' | LC_ALL=C sort
v1.1
v101
How can locales / collation make "v101" < "v1.1"
?
I think that en_US.UTF-8
locale have collation rule to strip .
sign. This test shown that I have point:
bash# printf 'v102\nv1.01' | LC_ALL=en_US.UTF-8 sort
v1.01
v102
bash# printf 'v102\nv1.03' | LC_ALL=en_US.UTF-8 sort
v102
v1.03
Am I right? And if I am right who didn't like dots? UTF-8 or English-speakers or Americans?
Is that POSIX compatible behavior?