I have a list of bigrams.
I have a pandas dataframe containing a row for each document in my corpus. What I am looking to do is get the bigrams that match from my list in each document into a new column in my dataframe.
What is the best way to accomplish this task? I have been searching for answers on stack overflow but haven't found something that gives me a specific answer I am looking for. I need the new column to contain every bigram found from my bigram list.
Any help would be appreciated!
The output what I have below is what I am looking for, although on my real example, I have used stop words so exact bigrams aren't found like the output below. Is there a way to do with with some sort of string contains maybe?
import pandas as pd
data = [['help me with my python pandas please'], ['machine learning is fun using svd with sklearn']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Message'])
import numpy as np
bigrams =[('python', 'pandas'),
('function', 'input'),
('help', 'jupyter'),
('sklearn', 'svd')]
def matcher(x):
for i in bigrams:
if i.lower() in x.lower():
return i
else:
return np.nan
df['Match'] = df['Message'].apply(matcher)
df