0

comm prompts that sorted text is not sorted and sort gives wrong results. For example,

printf 'G.EC\nGE.BO\nGE.DA\n' | sort

outputs

GE.BOAB
G.ECH
GE.DAG

Other example is the output of ls,

STATIONS_1800
stations.1800.txt
STATIONS.50d
STATIONS.D01
STATIONS.D16
stations.e2008.txt

which should be

STATIONS_1800
STATIONS.50d
STATIONS.D01
STATIONS.D16
stations.1800.txt
stations.e2008.txt

The output of env | grep 'LC\|LANG' is

LANGUAGE=en_GB.UTF-8
LC_ADDRESS=en_GB.UTF-8
LC_NAME=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
LC_PAPER=en_GB.UTF-8
LANG=en_GB.UTF-8
LC_IDENTIFICATION=en_GB.UTF-8
LC_TELEPHONE=en_GB.UTF-8
LC_MEASUREMENT=en_GB.UTF-8
LC_TIME=en_GB.UTF-8
LC_COLLATE=en_GB.UTF-8
LC_NUMERIC=en_GB.UTF-8

On another machine with the same LC* and LANG*, the sorting works perfect.

wsdzbm
  • 3,096
  • 3
  • 25
  • 28
  • what is the question? you can pipe `|sort` to get your output sorted – Tranbi Nov 23 '21 at 16:59
  • @Tranbi the order is not correct, and consequently, commands `sort`, `comm` give wrong results – wsdzbm Nov 23 '21 at 17:03
  • And to the guy voting to close, plz read the Q again. This is not a trivial problem. It really brought troubles. – wsdzbm Nov 23 '21 at 17:07
  • oh I get it. Have you tried that? https://stackoverflow.com/a/8895544/13525512 – Tranbi Nov 23 '21 at 17:07
  • @Tranbi No changes with LANG=C. In fact, I googled before making this Q but failed to find anything related. It's very weird because everything was fine hours ago. The only that may be related is I ran `sudo apt upgrade` on my ubuntu 18.04. But it's hard to verify. – wsdzbm Nov 23 '21 at 17:10
  • `ls` should never be used in scripts -- see [Why you shouldn't parse the output of `ls`](https://mywiki.wooledge.org/ParsingLs). Thus, the only legitimate use of `ls` is interactive, and interactive shell use belongs on [unix.se] or [Super User](https://superuser.com/) instead of Stack Overflow. – Charles Duffy Nov 23 '21 at 17:27
  • ...whereas how locales interact with `sort` has many, _many_ duplicates already on this site. – Charles Duffy Nov 23 '21 at 17:28
  • @CharlesDuffy Looks I made an improper title for this Q. Just because `ls` is easy to demonstrate the problem. Of course I don't care so much about the output of list view. The real difficulties are the wrong results of commands like `comm` and `sort` in text processing. – wsdzbm Nov 23 '21 at 17:42
  • Noted -- in that case, this should have been closed as a duplicate, instead of as off-topic. – Charles Duffy Nov 23 '21 at 19:49
  • ...re: candidates for said duplicate flagging: [unix support ignores whitespaces](https://stackoverflow.com/questions/6923464), [unexpected result from GNU sort](https://stackoverflow.com/questions/2691821), [behavior of GNU sort command](https://stackoverflow.com/questions/5982531) – Charles Duffy Nov 23 '21 at 19:52

2 Answers2

1

Set LC_COLLATE=C

$ printf 'G.EC\nGE.BO\nGE.DA\n' | sort
GE.BO
G.EC
GE.DA

$ printf 'G.EC\nGE.BO\nGE.DA\n' | LC_COLLATE=C sort
G.EC
GE.BO
GE.DA
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

From man sort:

   ***  WARNING  ***  The locale specified by the environment affects sort order.  Set `LC_ALL=C` to get the traditional sort order that uses native byte values.
Diego Torres Milano
  • 65,697
  • 9
  • 111
  • 134