I have a dataframe in pandas of organisation descriptions and project titles, shown below:
Columns are df['org_name']
, df['org_description']
, df['proj_title']
. I want to add a column with the similarity score between the organisation description and project title, for each project(each row).
I'm trying to use gensim
: https://radimrehurek.com/gensim/auto_examples/core/run_similarity_queries.html. However, I'm not sure how to adapt the tutorial for my use case, because in the tutorial we get a new query doc = "Human computer interaction"
and then compared that against the documents in the corpus individually. Not sure where this choice is made (sims
? vec_lsi
?)
But I want the similarity score for just the two items in a given row of dataframe df
, not one of them against the whole corpus, for each row and then append that to df
as a column. How can I do this?