1

I would like to sort and group a dictionary by keys. The keys are currently full names, but I would like to group all last names that are similar together and combine their value pairs. An excerpt of the input dictionary is below:

facdict = {'Yimei Li': [' Ph.D.', 'Assistant Professor of Biostatistics', 'liy3@email.chop.edu'], 
'Mingyao Li': [' Ph.D.', 'Associate Professor of Biostatistics', 'mingyao@mail.med.upenn.edu'], 
'Hongzhe Li': [' Ph.D', 'Professor of Biostatistics', 'hongzhe@upenn.edu'], 
'A. Russell Localio': [' JD MA MPH MS PhD', 'Associate Professor of Biostatistics', 'rlocalio@upenn.edu']}

The desired output is:

last_name_dict = {'Li': [[' Ph.D.', 'Assistant Professor of Biostatistics', 'liy3@email.chop.edu'], [' Ph.D.', 'Associate Professor of Biostatistics', 'mingyao@mail.med.upenn.edu'], [' Ph.D', 'Professor of Biostatistics', 'hongzhe@upenn.edu']], 
'Localio': [' JD MA MPH MS PhD', 'Associate Professor of Biostatistics', 'rlocalio@upenn.edu']}

I have tried to use the following dictionary comprehension:

search = re.compile(r"([A-Z]{1}[a-z]+)")
last_name_dict = {k.replace(k, search.findall(k)[-1:][0]): v for k, v in facdict.items()}

But this returns the last names of each entry with only the first value pair associated with it.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Its worth pointing out that it isn't possible to [sort a dictionary](https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value). – Lord Elrond Sep 12 '19 at 18:46

1 Answers1

2

A dictionary comprehension can only produce single key-value pairs; any repeated pairs are not combined, and just replace the previous value for the same key.

Just use a regular loop, and initialise the outer list with dict.setdefault():

last_name_dict = {}
for k, v in facdict.items():
    last_name = k.replace(k, search.findall(k)[-1:][0])
    last_name_dict.setdefault(last_name, []).append(v)

dictionary.setdefault(key, []) looks for the key in the dictionary and returns it. However, if the key is not yet set, the second argument is used to first set the value, before returning that object. So in the above code, the return value of last_name_dict.setdefault(...) always returns a list, so we can call .append(...) and add another entry.

If you don't mind that you won't get key errors for wrong keys, you could use a collections.defaultdict() object:

from collections import defaultdict

last_name_dict = defaultdict(list)
for k, v in facdict.items():
    last_name = k.replace(k, search.findall(k)[-1:][0])
    last_name_dict[last_name].append(v)

Take into account that last_name_dict[unknown_key] will create another list object and return that.

You can achieve the same using a dictionary comprehension if you first sort your input on last names and then group the input by the same last name value with itertools.groupby(), but this is not as efficient. The above solutions group the input in O(N) linear time; for 10 items you take 10 steps, for 100 items 100 steps, etc. Sorting takes O(NlogN) quasilinear time, where 10 items takes about 33 steps, 100 items takes about 664 steps, etc. It quickly no longer matters if sorting steps are faster, as the number of inputs grows the number of steps grows faster when sorting is required compared to when you don't need sorting and so is going to end up being slower anyway.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343