I am currently trying to find a string match from a dataframe that has list of actors and the movies that they acted in.
my_favourite_actors = ['Clint Eastwood','Morgan Freeman','Al Pacino']
Actor | Movie |
---|---|
Morgan Freeman, Tim Robbins, Bob Gunton, William Sadler, Clancy Brown | The Shawshank Redemption |
Marlon Brando, Al Pacino, James Caan | The Godfather |
Christian Bale, Heath Ledger, Aaron Eckhart, Gary Oldman, Maggie Gyllenhaal, Morgan Freeman | The Dark Knight |
Henry Fonda, Lee Cobb, Martin Balsam | 12 Angry Men |
Liam Neeson, Ralph Fiennes, Ben Kingsley | Schindler's List |
Elijah Wood, Viggo Mortensen, Ian McKellen | The Lord of the Rings: The Return of the King |
John Travolta, Uma Thurman, Samuel Jackson | Pulp Fiction |
Clint Eastwood, Eli Wallach, Lee Van Cleef | The Good, the Bad and the Ugly |
Brad Pitt, Edward Norton, Meat Loaf | Fight Club |
Leonardo DiCaprio, Joseph Gordon-Levitt, | Inception |
I am currently using the following approach to do the string matching, but it's taking a very long time since the whole dataset almost has 100K rows.
def favourite_actor(movie_dataset):
for actor in my_favourite_actors:
movie_index= movie_dataset.loc[movie_dataset['Actor'].str.contains(actor , case=False)].index
movie_dataset["_IsActorFound"].iloc[movie_index] = 1
The rows that will find my favourite actors will insert the value of 1 to it's adjacent column of ['_IsActorFound']
What can be an optimal and fast way to do the string match iteratively as my current code is taking extremely long time to execute?