8

Not looking for a work around. Looking to understand why Python sorts this way.

>>> a = ['aaa','Bbb']
>>> a.sort()
>>> print(a)
['Bbb', 'aaa']

>>> a = ['aaa','bbb']
>>> a.sort()
>>> print(a)
['aaa', 'bbb']
wim
  • 338,267
  • 99
  • 616
  • 750
Matt
  • 3,483
  • 4
  • 36
  • 46
  • 2
    By default it sorts by ASCII-value (or UNICODE value for UNICODE strings) where uppercase letters have lower numbers than lowercase letters. – Michael Butscher Jan 18 '19 at 03:53
  • 66 is less than 97 :D – Mulan Jan 18 '19 at 04:19
  • 1
    @MichaelButscher "uppercase letters have lower numbers than lowercase letters" is not true across all of Unicode (even if you attempt to make pairs of lower and upper forms of the same letter.). Matt, you can consider it a fixed, arbitrary ordering since you aren't specifying any [text sorting rules](http://cldr.unicode.org/), such as via a locale. – Tom Blodget Jan 18 '19 at 07:15
  • related [Sorting list of string with specific locale in python](https://stackoverflow.com/q/11121636/4279) – jfs Jan 19 '19 at 21:56

3 Answers3

11

This is because upper case chars have an ASCII value lower than that of lower case. And hence if we sort them in increasing order, the upper case will come before the lower case

  • ASCII of A is 65
  • ASCII of a is 97

65<97

And hence A < a if you sort in increasing order

mrid
  • 5,782
  • 5
  • 28
  • 71
6

str is sorted based on the raw byte values (Python 2) or Unicode ordinal values (Python 3); in ASCII and Unicode, all capital letters have lower values than all lowercase letters, so they sort before them:

>>> ord('A'), ord('Z')
(65, 90)
>>> ord('a'), ord('z')
(97, 112)

Some locales (e.g. en_US) will change this sort ordering; if you pass locale.strxfrm as the key function, you'll get case-insensitive sorts on those locales, e.g.

>>> import locale
>>> locale.setlocale(locale.LC_COLLATE, 'en_US.utf-8')
>>> a.sort(key=locale.strxfrm)
>>> a
['aaa', 'Bbb']
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • str is not a sequence of bytes on Python 3. str is a sequence of Unicode codepoints there. Though the values are the same in the ascii range. – jfs Jan 19 '19 at 21:55
  • @jfs: True. I could have sworn the OP had somehow marked this as Python 2 specifically, but clearly not; maybe it was some other question I read around the same time. – ShadowRanger Jan 20 '19 at 00:23
  • yes, the first revision of the question asks about Python 2 specifically. – jfs Jan 20 '19 at 06:50
  • Now that we've seen technically why - I still wonder practically why anyone would want this? – Matt Jan 21 '19 at 16:39
1

Python treats uppercase letters as lower than lowercase letters. If you want to sort ignoring the case sensitivity. You can do something like this:

a = ['aaa','Bbb']
a.sort(key=str.lower)
print(a)

Outputs:
['aaa', 'Bbb']

Which ignores the case sensitivity. The key parameter "str.lower" is what allows you to do this. The following documentation should help. https://docs.python.org/3/howto/sorting.html

farstop
  • 11
  • 2