0

I want to populate a dictionary by iterating over two other dictionaries. I have a working example and i would like to know if there is a way to do it in dictionary comprehension (mainly for performance reasons) or make it more pythonic. First of all here is the code:

def get_replacement_map(dict_A, dict_B, min_sim):
    replacement_map = {}  # the dictionary i want to populate

    for key_A, value_A in dict_A.items():

        best_replacement = ()
        best_similarity = 0

        for key_B, value_B in dict_B.items():

            if key_B[0] != key_A[0]:

                # similarity(x,y) may return None so in that case assign sim = 0
                sim = similarity(value_A[0], value_B[0]) or 0
                if sim > best_similarity and sim > min_sim:
                    best_replacement = key_B
                    best_similarity = sim

                    if sim > 0.9:  # no need to keep looking, this is good enough!
                        break

        if best_replacement:
            synonym_map[key_A] = best_replacement

    return replacement_map 

It does a simple thing. It calculates the similarity between the elements of two dictionaries and for each element finds the best possible replacement (if the similarity is above the min_sim threshold). The purpose is to build a dictionary of replacements.

I am new to Python so i am pretty sure that this is not the pythonic way to implement this. I have seen big improvements in performance by using comprehensions instead of for loops, so i was curious if this code can be also done using nested dictionary comprehensions and also if that makes sense to do.

If it is not a good idea to do it using comprehensions are there any improvements i can do?

Christos Baziotis
  • 5,845
  • 16
  • 59
  • 80
  • One thing to mention (and this is more a note than an outright answer) is your use of `dict.items()`. Although definitely much better in python3 than [it was in python 2](http://stackoverflow.com/questions/12543837/python-iterating-over-list-vs-over-dict-items-efficiency), it may be worthwhile to see the performance improvement by iterating through the dict keys instead with `for key in dict`, then calling values with `dict[key]`. – R Nar Jul 21 '16 at 21:26
  • @RNar thanks i will try that. – Christos Baziotis Jul 21 '16 at 21:51

1 Answers1

0

This is a complicated enough replacement schema that if you were to contain it all in a one-liner, it would be very difficult to read. Maintaining the structure and spacing relevant to making the flow understandable is the more pythonic way to solve this.

As for performance gains, you likely won't see any as discussed in this question.

Community
  • 1
  • 1
Aaron
  • 10,133
  • 1
  • 24
  • 40
  • If I were to make an educated guess, your best performance gains would come from optimizing the `similarity()` function and using cython – Aaron Jul 21 '16 at 21:29
  • You are correct but the similarity function is a function from a 3rd-party library and there is not much i can do. – Christos Baziotis Jul 21 '16 at 21:41