I have two dataframes and I want comparying the content of both text column:
DF1:
pd.DataFrame({'text1':['Today is monday','I need money'],'name:['calendar', 'finance']})
text1 name
Today is monday calendar
I need money finance
DF2
pd.DataFrame({'text2':['My car','Saturday Night','Lost my Wallet']})
text2
My car
Saturday Night
Lost my Wallet
and then I am performing an NLP task that compares df1 and df2 text:
df = pd.DataFrame()
for i in df1['text1']:
for j in df2['text2']::
s=nlp(i)
p=nlp(j)
out = s.similarity(p)
df=df.append({'text1':i, 'text2':j, 'index':out}, ignore_index=True)
And the output is:
text1 text2 index
Today is monday My car 1
Today is monday Saturday Night 0.23
Today is monday Lost my wallet 0.40
I need money My car 0.34
I need money Saturday Night 0.13
I need money LOst my wallet 0.20
The output above works just fine. So I am just comparying the text1
and text2
values. What index
means is not important for this problem.
I want to change the output to also add df1['name'] colum
but I dont´t know how to do it as my loop is iterating over the text column only
Expected Output:
text1 name text2 index
Today is monday calendar My car 1
Today is monday calendar Saturday Night 0.23
Today is monday calendar Lost my wallet 0.40
I need money finance My car 0.34
I need money finance Saturday Night 0.13
I need money finance Lost my wallet 0.20