0

Assume I have a class called Subject:

class Subject():
    def __init__(self, name, longName):
        self.name = name
        self.long_name = longName

    def __repr__(self):
        return self.long_name + "(" + self.name + ")"

In my code, I create a bunch of these objects, assign name and long_name, then sort the list alphabetically with

sorted(subjects, key=attrgetter("long_name"))

The list that I get looks like this:

[BETRIEBSWIRTSCHAFT (BW), MATHEMATIK (MA), WIRTSCHAFTSINFORMATIK (WI), WIRTSCHAFTSLEHRE (WW), fio (fio), ÜBUNGSRATHAUS (ÜR)]

Obviously, that's not right. How can I properly sort a list of objects alphabetically by an attribute, taking into account uppercase/lowercase and unicode characters like umlauts?

In the end, the list should look like this:

[BETRIEBSWIRTSCHAFT (BW), fio (fio), MATHEMATIK (MA), ÜBUNGSRATHAUS (ÜR), WIRTSCHAFTSINFORMATIK (WI), WIRTSCHAFTSLEHRE(WW)]

Peter W.
  • 2,323
  • 4
  • 22
  • 42

1 Answers1

1

While sorting words containing both upper and lower cases, it is helpful to convert them all to the same case.

>>> sbjs = [('fio', 'fio'), 
        ('MA', 'MATHEMATIK'), 
        ('ÜR', 'ÜBUNGSRATHAUS'), 
        ('BW', 'BETRIEBSWIRTSCHAFT'), 
        ('WI', 'WIRTSCHAFTSINFORMATIK'), 
        ('WW', 'WIRTSCHAFTSLEHRE')]
>>> subjects = [Subject(x, y) for x, y in sbjs]

>>> sorted(subjects, key=lambda x: x.long_name)
[BETRIEBSWIRTSCHAFT(BW), MATHEMATIK(MA), WIRTSCHAFTSINFORMATIK(WI), WIRTSCHAFTSLEHRE(WW), fio(fio), ÜBUNGSRATHAUS(ÜR)]

>>> sorted(subjects, key=lambda x: x.long_name.lower())
[BETRIEBSWIRTSCHAFT(BW), fio(fio), MATHEMATIK(MA), WIRTSCHAFTSINFORMATIK(WI), WIRTSCHAFTSLEHRE(WW), ÜBUNGSRATHAUS(ÜR)]

UPDATE:
You need to install the icu package. pip install pyicu worked on my machine.

>>> import icu

>>> collator = icu.Collator.createInstance(icu.Locale('de_DE.UTF-8'))

>>> sorted(subjects, key=lambda x: collator.getSortKey(x.long_name.lower()))
[BETRIEBSWIRTSCHAFT(BW), fio(fio), MATHEMATIK(MA), ÜBUNGSRATHAUS(ÜR), WIRTSCHAFTSINFORMATIK(WI), WIRTSCHAFTSLEHRE(WW)]
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
  • While this solves the problem of uppercase/lowercase sorting, it still doesn't take the last entry in the list into account. Any ideas? – Peter W. Oct 26 '14 at 16:51
  • you're right, the last element is not in the right place. I did some digging, and this topic has has been answered on SO http://stackoverflow.com/questions/1097908/how-do-i-sort-unicode-strings-alphabetically-in-python – Haleemur Ali Oct 26 '14 at 16:59
  • I just spent three hours setting up ICU4C and PyICU on two different machines, but it was totally worth it. Thanks! :D – Peter W. Oct 26 '14 at 21:56
  • Note for future people: If `pip install pyicu` fails with something like "fatal error: unicode/utypes.h: No such file or directory", try `apt-get install libicu-dev` first. – Peter W. Mar 13 '15 at 22:06