I am trying to get Perl and the GNU/Linux sort(1) program agree on how to sort Unicode strings. I'm running sort with LANG=en_US.UTF-8
. In the Perl program I have tried the following methods:
use Unicode::Collate
with$Collator = Unicode::Collate->new();
use Unicode::Collate::Locale
with$Collator = Unicode::Collate->new(locale => $ENV{'LANG'});
use locale
Each one of them failed with the following errors (from the Perl side):
- Input is not sorted: [----,] came after [($1]
- Input is not sorted: [...] came after [&]
- Input is not sorted: [($1] came after [1]
The only method that worked for me involved setting LC_ALL=C
for sort, and using 8-bit characters in Perl. However, in this way Unicode strings are not properly ordered.