2

I have a python dict that looks like this:

file_dict = {'a.txt' : <#:text_object>, 'b.txt': <#:text_object>, 'c.txt': <#:text_object>}

The value text_object is an object that contains a bunch of analytic data about the file. It's an object like this...

class text_object:
     #...a bunch of methods and setters...
     def get_word_count(self):
          return self.word_count #integer value

These objects are in a dict so that I can find specific files and their corresponding data quickly via their filename. I want to sort the file_dict in word_count order so I can output the objects with the smallest word count to help find anomalies in the data collection process.

How do I sort the file_dict based on the value in the text_object.get_word_count() classes stored within it?

Joshua Hedges
  • 307
  • 4
  • 16
  • 1
    Implement the `__lt__` method to make the python sorting tools work. I think this is a dup of https://stackoverflow.com/questions/7152497/making-a-python-user-defined-class-sortable-hashable – tdelaney Aug 27 '21 at 15:18
  • 1
    The question linked as duplicate is questionable. Why implement `__lt__` when `sorted` can take a `key` argument? – Stef Aug 27 '21 at 15:22
  • 1
    I also disagree with the closure, as implementing `__lt__` for the `text_object` class still won't help in sorting the dictionary itself. – ddejohn Aug 27 '21 at 15:24
  • @blorgon has a good point. Can you clarify what you mean by sorting the dictionary? I took it to mean sorting the values, as in `sorted(file_dict.values())`. – tdelaney Aug 27 '21 at 15:28
  • Dictionaries kindof maintain insertion order (but not really because reasssigning an existing key keeps the key's original postion) so you could sort and then rebuild the dict as done in an answer below. Is that what you want? You end up with a dict where its `keys()` and `values()` are sorted? – tdelaney Aug 27 '21 at 15:32
  • 2
    @stef - When you control the definition of a class and want to make it sortable, `__lt_` is better than having to dup the sort logic everywhere you want sorting. It also enables other comparisons on objects besides sorting. I think OP wants `.get_word_count()` to be the canonical sort value of the class. If it turns out OP just wants this sorting behavior in this particular case, he can mention that here in the comments and then we can remove the dup. – tdelaney Aug 27 '21 at 15:41
  • 1
    Guys, I've read the duplicate answer and it really isn't much of an answer. I don't understand your standards here because the other related question doesn't even provide a solid code example in their question. I have had admins remove a question for not having code in the question. It also isn't very clear how to implement the __lt__ method from the alternate answer. I think my question would add value to Stack Overflow so I ask that you please let it be considered for answers. – Joshua Hedges Aug 27 '21 at 19:14
  • You have both a link to a duplicate and an answer below. The duplicate indicating `__lt__` is good for when there is a single method of sorting your objects. The answer below is good when there are lots of different ways you need to sort your objects. (And the duplicate was asked 10 years ago, so 'standards' have developed since) – quamrana Aug 27 '21 at 19:22
  • 1
    @JoshuaHedges I've addressed your question of how to implement `__lt__` in an edit to my answer below. – ddejohn Aug 27 '21 at 19:54

1 Answers1

3

I believe this'll do the trick, but I can't test it without your actual dictionary and the objects. I suggest providing a sample of your data.

sorted_keys = sorted(file_dict, key=lambda k: file_dict[k].get_word_count())
sorted_file_dict = {k: file_dict[k] for k in sorted_keys}

Also, "sorting" a dictionary is a bit unnecessary. The whole point is that objects are able to be looked up via a hashtable, negating the need for any sort of ordering. If you want some kind of ordering while iterating, then you can iterate over the sorted keys. But a dictionary itself doesn't really need to be sorted, in most cases.

Closure edit: I disagree with closing this as a duplicate, since implementing __lt__ for the text_object class still won't help you sort a dictionary of text_object instances. Also, heads up, class names are conventionally CamelCase, consider renaming text_object to TextObject.

Addressing __lt__()

Since this question may not be reopened, I'll address how OP might use the information in the duplicate. To be honest, there isn't much of a difference in the end product, BUT it may not be the worst idea to implement it anyway:

class TextObject:
    # attributes and methods, etc
    def get_word_count(self):
        return self.word_count

    def __lt__(self, other):
        return self.get_word_count() < other.get_word_count()

Then, in order to "sort" your dictionary:

sorted_keys = sorted(file_dict, key=lambda k: file_dict[k])
sorted_file_dict = {k: file_dict[k] for k in sorted_keys}

Notice that the only difference is that the sorted() key function is no longer directly calling TextObject.get_word_count().

As others have mentioned, this method may be ideal for you if you're planning to do things like some_text_object < other_text_object.

PS - you may want to look into the @property decorator.

ddejohn
  • 8,775
  • 3
  • 17
  • 30
  • 3
    I think it should be `.get_word_count()` and not just `.get_word_count` – Stef Aug 27 '21 at 15:19
  • 1
    This is a good alternate for making the class itself sortable via the `__lt__` method. Its a question of how often this class needs sorting. – tdelaney Aug 27 '21 at 15:22
  • Yep. I was thinking of passing a function reference to the `key` parameter lol. – ddejohn Aug 27 '21 at 15:22