Fastest way to pass function on all items in list

Question

I have a list of about 50,000 or so words, and I want to pass a function on each item in the list. Then I want to save the original word as a key, and the translated word as the respective value in a dictionary. Right now I know I can do this:

translations = {word: translate(word) for word in word_list}

But this takes too long I think. Is there a faster way this can be accomplished?

Not sure, just wondering. Right now it seems to take quite a while and I just thought there might be a more efficient way. — vkumar, May 15 '16 at 15:08
It's very likely that the majority of your time is spent inside of `translate`. — chthonicdaemon, May 15 '16 at 15:19
can you see how much time translate(word) is taking for each word ? If its taking more time, so you might need to improve the code there. — Gunjan, May 15 '16 at 15:51
Thanks, I rewrote the translate function, and it improved the speed greatly. — vkumar, May 15 '16 at 16:06
The lesson here is knowing what to optimize so you don't waste your time doing it to code that doesn't matter, which is fairly easy to do in Python — see [_How can you profile a Python script?_](http://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script) — martineau, May 15 '16 at 17:47
You may consider profiling your code, to see where it spends its time. Try https://github.com/rkern/line_profiler et al. — boardrider, May 16 '16 at 09:55

score 0 · Answer 1 · answered May 15 '16 at 15:17

Mapping functions should work faster than dict comprehensions:

translations = dict(zip(word_list, map(translate, word_list)))

What happens here is:

We apply the function to each element in word_list, returning a map object
Combine it into a sequence (zip object) of one-to-one element tuples from the original list and that map object
Convert the resulting sequence into a dictionary

After setting up a test program, it appears that there is a slight performance improvement. This is the code:

from datetime import datetime
def translate(wo):
    return wo.upper()

word_list = {str(i):str(i+1) for i in range(50000)}
d = datetime.now()
translations = dict(zip(word_list, map(translate, word_list)))
print(datetime.now() - d)
d = datetime.now()
translations = {word: translate(word) for word in word_list}
print(datetime.now() - d)

After a few runs, the second printed time is always greater than the first one, which proves the efficiency.

score 0 · Accepted Answer · answered May 15 '16 at 15:26

If you only need few values, and won't iterate over the dict, you can try doing it lazily:

class MyDefaultDict(dict):
    def __init__(self, word_iterable, translate):
        self.word_set = frozenset(word_iterable)
        self.translate = translate
    def __missing__(self, key):
        if key in self.word_set:
            translated = translate(key)
            self[key] = translated
            return translated
        raise KeyError(key)

Fastest way to pass function on all items in list

2 Answers2