Do Python's str.__lt__
or sorted
order characters based on their unicode index or by some locale-dependent collation rules?
Asked
Active
Viewed 1,623 times
8
-
possible duplicate of [String Comparison Technique Used by Python](http://stackoverflow.com/questions/4806911/string-comparison-technique-used-by-python) – Adriano Repetti Oct 22 '14 at 10:50
-
@AdrianoRepetti, the question you linked seems to be more about the principle of lexicographic ordering. – Aivar Oct 22 '14 at 11:03
-
no, it's same question you asked (does "<" performs culture aware comparison?) – Adriano Repetti Oct 22 '14 at 11:08
1 Answers
9
No, string ordering does not take locale into account. It is based entirely on the Unicode codepoint sort order.
The locale
module does provide you with a locale.strxform()
function that can be used for locale-specific sorting:
import locale
sorted(list_of_strings, key=locale.strxfrm)
This tool is quite limited; for any serious collation task you probably want to use the PyICU library:
import PyICU
collator = PyICU.Collator.createInstance(PyICU.Locale(locale_spec))
sorted(list_of_strings, key=collator.getSortKey)

Martijn Pieters
- 1,048,767
- 296
- 4,058
- 3,343
-
1[it may fail](http://stackoverflow.com/q/3412933/4279), [PyICU could be used instead](http://stackoverflow.com/a/16701346/4279) – jfs Oct 22 '14 at 10:57
-