I have a dataframe with user comments on a movie and would like to parse examples of when a user describes a movie as "movie1" meets "movie2"
User id Old id_New id Score Comments
947952018 3101_771355141 3.0 If you want to see a comedy and have a stupid ...
805407067 11903_18330 5.0 Argento?s fever dream masterpiece. Fairy tale ...
901306244 16077_771225176 4.5 Evil Dead II meets Brothers Grimm and Hawkeye ...
901306244 NaN_381422014 1.0 Biggest disappointment! There's a host of ...
15169683 NaN_22471 3.0 You know in the original story of Pinocchio he...
I've written a function that takes in a comment, finds the word "meets" and takes the first n words before and after meets and returns (hopefully) the essence of the titles of movie1 & movie2, which I plan to fuzzy match later to titles in another dataframe.
def parse_movie(comment, num_words):
words = comment.partition('meets')
words_before = words[0].split(maxsplit=num_words)[-num_words:]
words_after = words[2].split(maxsplit=num_words)[:num_words]
movie1 = ' '.join(words_before)
movie2 = ' '.join(words_after)
return movie1, movie2
How can I apply this function on the comments column of the original pandas dataframe and put the returned movie1 and movie2 titles in separate columns? I tried
df['Comments'].apply(parse_titles)
but then I cannot specify num_words I'd like to use. Operating directly on the column also doesn't work for me, and I'm not sure how to put the new movies into new columns.
parse_movie(sample['Comments'], 4)
AttributeError: 'Series' object has no attribute 'partition'
Suggestions would be appreciated!