I'm trying to compute the similarity between a set of queries and a set a result for each query. I would like to do this using tfidf scores and cosine similarity. The issue that I'm having is that I can't figure out how to generate a tfidf matrix using two columns (in a pandas dataframe). I have concatenated the two columns and it works fine, but it's awkward to use since it needs to keep track of which query belongs to which result. How would I go about calculating a tfidf matrix for two columns at once? I'm using pandas and sklearn.
Here's the relevant code:
tf = TfidfVectorizer(analyzer='word', min_df = 0)
tfidf_matrix = tf.fit_transform(df_all['search_term'] + df_all['product_title']) # This line is the issue
feature_names = tf.get_feature_names()
I'm trying to pass df_all['search_term'] and df_all['product_title'] as arguments into tf.fit_transform. This clearly does not work since it just concatenates the strings together which does not allow me to compare the search_term to the product_title. Also, is there maybe a better way of going about this?