I am now trying to process IMDb data with Python dictionary. After some basic data cleaning, I have a dictionary people_dict
, which looks like
people_dict = {...,936: ['And White Was the Night (2015)', 'Lipton Cockton in the Shadows of Sodoma (1995)', 'Maraton (1997)', 'Rundi (1990)', 'Sounds Like Suomi (2008)'],...}
where the key stands for the id of an actor/actress and the list is a set of movies he/she has acted in.
Now I am trying to get another dictionary movie_dict
based on people_dict
, which looks like
movie_dict = {...,'Beats, Rhymes & Life: The Travels of a Tribe Called Quest (2011)': [3],...}
where the key is name of movie while the value is actor/actress id. However, my implementation (see below) for this is nested loops but almost 100, 000 movies and actor/actress are involved. It optimistically could give what I want in a week.
for value in movie_dict.keys():
for people_id, movie_list in people_dict.items():
if value in movie_list:
movie_dict[value].append(people_id)
So is there anything I could do to significantly reduce the runtime. I have checked out this thread where map seems to be a good option.