0

Let's say I have the following dictionary:

full_dic = {
  'aa': 1,
  'ac': 1,
  'ab': 1,
  'ba': 2,
  ...
}

I normally use standard dictionary comprehension to remove dupes like:

t = {val : key for (key, val) in full_dic.items()}
cleaned_dic = {val : key for (key, val) in t.items()}

Calling print(cleaned_dic) outputs {'ab': 1,'ba': 2, ...}

With this code, the key that remains seems to always be the final one in the list, but I'm not sure that's even guaranteed as dictionaries are unordered. Instead, I'd like to find a way to ensure that the key I keep is the first alphabetically.

So, regardless of the 'order' the dictionary is in, I want the output to be:

>> {'aa': 1,'ba': 2, ...}

Where 'aa' comes first alphabetically.


I ran some timer tests on 3 answers below and got the following (dictionary was created with random key/value pairs):

dict length:  10
# of loops:  100000
HoliSimo (OrderedDict): 0.0000098405 seconds
Ricardo: 0.0000115448 seconds
Mark (itertools.groupby): 0.0000111745 seconds

dict length:  1000000
# of loops:  10
HoliSimo (OrderedDict): 6.1724137300 seconds
Ricardo: 3.3102091300 seconds
Mark (itertools.groupby): 6.1338266200 seconds

We can see that for smaller dictionary sizes using OrderedDict is fastest but for large dictionary sizes it's slightly better to use Ricardo's answer below.

John
  • 497
  • 2
  • 16

3 Answers3

1

You should use the OrderectDict class.

import collections
full_dic = {
  'aa': 1,
  'ac': 1,
  'ab': 1
}
od = collections.OrderedDict(sorted(full_dic.items()))

In this way you will be sure to have sorted dictionary (Original code: StackOverflow).

And then:

result = {}
for k, vin od.items():
    if value not in result.values():
        result[key] = value

I'm not sure if it will speed up the computation but you can try:

inverted_dict = {}
for k, v in od.items():
    if inverted_dict.get(v) is None:
        inverted_dict[v] = k

res = {v: k for k, v in inverted_dict.items()}
HoliSimo
  • 56
  • 3
  • Do you care about computing time? – HoliSimo Nov 17 '22 at 00:40
  • Yes. I was actually going to run some timers and see which is fastest, unless you already have that answer. – John Nov 17 '22 at 01:13
  • I rand some timer tests, which can be found as an edit in my original question. – John Nov 17 '22 at 02:13
  • I tried the second snippet, it seems to be way faster. – HoliSimo Nov 17 '22 at 09:19
  • In your 2nd bit using result.get(), are you not asking if the key is in 'result'? Since the key cannot exist in 'result' yet (due to no dupe keys in the original dict), the result will always be 'None' and therefore 'result' will simply be a copy of the original dictionary. – John Nov 18 '22 at 22:37
  • Yep, you are right sorry. What you can do is store a dict with values as keys. I will correct the snippet. – HoliSimo Nov 19 '22 at 13:38
1
t = {val : key for (key, val) in dict(sorted(full_dic.items(), key=lambda x: x[0].lower(), reverse=True)).items()}
cleaned_dic = {val : key for (key, val) in t.items()}
dict(sorted(cleaned_dic.items(), key=lambda x: x[0].lower()))
>>> {'aa': 1, 'ba': 2}
Ricardo
  • 691
  • 3
  • 11
  • Other questions on this page work very well also but this one was fastest for dictionary sizes that I will be using. See my original question for timer test results. – John Nov 17 '22 at 02:14
1

Seems like you can do this with a single sort and itertools.groupby. First sort the items by value, then key. Pass this to groupby and take the first item of each group to pass to the dict constructor:

from itertools import groupby

full_dic = {
  'aa': 1,
  'ac': 1,
  'xx': 2,
  'ab': 1,
  'ba': 2,
}
groups = groupby(sorted(full_dic.items(), key=lambda p: (p[1], p[0])), key=lambda x: x[1])
dict(next(g) for k, g in groups)
# {'aa': 1, 'ba': 2}
Mark
  • 90,562
  • 7
  • 108
  • 148