8

Do Python's str.__lt__ or sorted order characters based on their unicode index or by some locale-dependent collation rules?

jfs
  • 399,953
  • 195
  • 994
  • 1,670
Aivar
  • 6,814
  • 5
  • 46
  • 78

1 Answers1

9

No, string ordering does not take locale into account. It is based entirely on the Unicode codepoint sort order.

The locale module does provide you with a locale.strxform() function that can be used for locale-specific sorting:

import locale

sorted(list_of_strings, key=locale.strxfrm)

This tool is quite limited; for any serious collation task you probably want to use the PyICU library:

import PyICU

collator = PyICU.Collator.createInstance(PyICU.Locale(locale_spec))
sorted(list_of_strings, key=collator.getSortKey)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343