If charA
and charB
are actually characters (that is, length-1 strings), then charA < charB
iff ord(charA) < ord(charB)
.
That is, if the Unicode code point of charA is a smaller number than the Unicode code point of charB, it's a smaller character.
Notice that this means that 'Z' < 'a'
, because in Unicode, all of the capital letters A-Z come before the lowercase letters a-z:
>>> ord('A')
65
>>> ord('Z')
90
>>> ord('a')
97
>>> 90 < 97 # of course
True
>>> 'Z' < 'a' # possibly surprising
True
If you want some kind of "friendly" comparison, you have to ask for it explicitly.
Often, you just want casefold
, which aggressively gets rid of case information, so that, e.g., A
and a
can be treated the same:
>>> 'A'.casefold()
'a'
>>> 'Z'.casefold() < 'a'.casefold()
False
For full generality, you probably want something like the Unicode Collation Algorithm. But Python doesn't come with that built in, so you'd need a third-party library like pyuca
.