10

I want to use sort function of python but it does not work well.

sorted( ['ا', 'ب', 'پ', 'ح', 'س', 'ص', 'ف', 'ک', 'ک', 'ک', 'م', 'م']) = 
 ['ا', 'ب', 'ح', 'س', 'ص', 'ف', 'م', 'م', 'پ', 'ک', 'ک', 'ک']
majid lesani
  • 189
  • 9
  • 2
    Could you be more specific about "does not work well"? What is the error? What is the expected output? – Dani Mesejo Jan 31 '19 at 10:45
  • in persian character order پ is after ب but here is false. the problem is it use arabic order that is written in unicode. – majid lesani Jan 31 '19 at 10:48
  • You will probably need to set a LOCALE to sort stuff. Try the suggestions here- https://stackoverflow.com/questions/11121636/sorting-list-of-string-with-specific-locale-in-python – bhathiya-perera Jan 31 '19 at 10:53

2 Answers2

15

try using PyICU:

import icu
collator = icu.Collator.Collator.createInstance(icu.Locale('fa_IR.UTF-8'))
print ([i for i in sorted(['ا', 'ب', 'پ', 'ح', 'س', 'ص', 'ف', 'ک', 'ک', 'ک', 'م', 'م'], key=collator.getSortKey)])
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Ria
  • 10,237
  • 3
  • 33
  • 60
2

No, it works well... I believe sorted sorts characters based on their associated unicode value. The following is the unicode character for each character:

ا : \u0627
ب : \u0628
ح : \u062d
س : \u0633
ص : \u0635
ف : \u0641
م : \u0645
پ : \u067e
ک : \u06a9

As you can see, the unicode of پ is \u067e while the unicode of ب is \u0628. And the reason for that is the ب is also an Arabic character the same as ا , ح, س, ص, ف, and م. While پ and ک are not.

Anwarvic
  • 12,156
  • 4
  • 49
  • 69