Apply a function to column of pandas dataframe

Question

I have a dataframe with user comments on a movie and would like to parse examples of when a user describes a movie as "movie1" meets "movie2"

User id     Old id_New id   Score   Comments
947952018   3101_771355141  3.0 If you want to see a comedy and have a stupid ...
805407067   11903_18330     5.0 Argento?s fever dream masterpiece. Fairy tale ...
901306244   16077_771225176 4.5 Evil Dead II meets Brothers Grimm and Hawkeye ...
901306244   NaN_381422014   1.0 Biggest disappointment! There&#39;s a host of ...
15169683    NaN_22471       3.0 You know in the original story of Pinocchio he...

I've written a function that takes in a comment, finds the word "meets" and takes the first n words before and after meets and returns (hopefully) the essence of the titles of movie1 & movie2, which I plan to fuzzy match later to titles in another dataframe.

def parse_movie(comment, num_words):
    words = comment.partition('meets')
    words_before = words[0].split(maxsplit=num_words)[-num_words:] 
    words_after = words[2].split(maxsplit=num_words)[:num_words]
    movie1 = ' '.join(words_before)
    movie2 = ' '.join(words_after)
    return movie1, movie2

How can I apply this function on the comments column of the original pandas dataframe and put the returned movie1 and movie2 titles in separate columns? I tried

df['Comments'].apply(parse_titles)

but then I cannot specify num_words I'd like to use. Operating directly on the column also doesn't work for me, and I'm not sure how to put the new movies into new columns.

parse_movie(sample['Comments'], 4)
AttributeError: 'Series' object has no attribute 'partition'

Suggestions would be appreciated!

You can pass arguments with `apply()` using the `args` argument. Have a look at the [docs](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html). — andrew_reece, Dec 19 '17 at 02:06

score 1 · Accepted Answer · answered Dec 19 '17 at 02:47

Based on how to split column of tuples in pandas dataframe? answer. This can be done using lambda function and apply(pd.Series). Save the results into dataframe column 'movie1' and 'movie2'.

num_words = 4
df[['movie1','movie2']] = df['comments'].apply(lambda comment: parse_movie(comment, num_words)).apply(pd.Series)

Apply a function to column of pandas dataframe

1 Answers1