Processing dictionaries in Python with large amount of data

Question

I am now trying to process IMDb data with Python dictionary. After some basic data cleaning, I have a dictionary people_dict, which looks like

people_dict = {...,936: ['And White Was the Night (2015)', 'Lipton Cockton in the Shadows of Sodoma (1995)', 'Maraton (1997)', 'Rundi (1990)', 'Sounds Like Suomi (2008)'],...}

where the key stands for the id of an actor/actress and the list is a set of movies he/she has acted in.

Now I am trying to get another dictionary movie_dict based on people_dict, which looks like

movie_dict = {...,'Beats, Rhymes & Life: The Travels of a Tribe Called Quest (2011)': [3],...}

where the key is name of movie while the value is actor/actress id. However, my implementation (see below) for this is nested loops but almost 100, 000 movies and actor/actress are involved. It optimistically could give what I want in a week.

for value in movie_dict.keys():
    for people_id, movie_list in people_dict.items():
        if value in movie_list:
            movie_dict[value].append(people_id)

So is there anything I could do to significantly reduce the runtime. I have checked out this thread where map seems to be a good option.

Possible duplicate of https://stackoverflow.com/questions/2823315/how-to-reverse-a-dictionary-that-it-has-repeated-values-python — cs95, May 27 '18 at 21:20
@coldspeed It is working and it just takes 5 seconds! Thank you so much! — Mr.Robot, May 27 '18 at 23:24

Processing dictionaries in Python with large amount of data

0 Answers0