It seems after some first tests, that Python is using the same sorting order as Linux sort (gnu sort) with the C sorting order (if the locale is set to "C").
However I'd like to be able to write Python code that is sorting and comparing the same way as gnu sort depending on the locale.
Small example code to illustrate the issue:
import os
import subprocess
words = [
"Abd",
"éfg",
"aBd",
"aBd",
"zzz",
"ZZZ",
"efg",
"abd",
"fff",
]
with open("tosort", "w") as fout:
for word in words:
fout.write(word + "\n")
os.environ["LC_ALL"] = "en_US.UTF-8"
proc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE)
sort_en_utf = proc.stdout.read().decode('utf-8').split()
os.environ["LC_ALL"] = "C"
proc = subprocess.Popen(["sort", "tosort"], stdout=subprocess.PIPE)
sort_c = proc.stdout.read().decode('utf-8').split()
os.environ["LC_ALL"] = "en_US.UTF-8"
sort_py = sorted(words)
for row in zip(sort_en_utf, sort_c, sort_py):
print(" ".join(row))
If above code is run I get following output:
abd Abd Abd
aBd ZZZ ZZZ
aBd aBd aBd
Abd aBd aBd
efg abd abd
éfg efg efg
fff fff fff
zzz zzz zzz
ZZZ éfg éfg
column 1 is the sorting / comparing order that I'd like to have in my python code if the locale is "en_US.UTF-8" column 2 and 3 show, that python sorts the same way as linux' sort if the locale is set to "C".
So I'd also like, to know whether there is a way to have:
"éfg" < "fff"
yield True. I don't insist on a compare operator I can also call a function.
but the ordering result should be considering the current locale.