I am having issues with assigning the cosine similarity in array back to pandas Dataframe. I have tested the cosine similarity matrix using the below code
# Find the closest 5 sentences of the corpus for each query sentence based on cosine similarity
top_k = min(5, len(corpus))
for query in queries:
query_embedding = model.encode(query, convert_to_tensor=True)
# We use cosine-similarity and torch.topk to find the highest 5 scores
cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
top_results = torch.topk(cos_scores, k=top_k)
print("\n\n======================\n\n")
print("Query:", query)
print("\nTop 5 most similar sentences in corpus:")
for score, idx in zip(top_results[0], top_results[1]):
print(corpus[idx], "(Score: {:.4f})".format(score))
The below is the output produced by code
However I want to write the similarity score back to a Dataframe with structure like below
Dummy data code to replicate the example
df1 = pd.DataFrame(columns=['Query','Corpus'])
df1['Query'] = ["A man is eating pasta","A man is eating pasta","A man is eating pasta","A man is eating pasta","A man is eating pasta"]
df1['Corpus'] = ["A man is eating food","A man is eating a piece of bread.","A man is riding a horse","A man is riding a white horse on an enclosed ground","A cheetah is running behind its prey"]
df1
**Detailed example can be found here https://www.codegrepper.com/code-examples/python/sentence+transformers **
I did reference similar questions Cosine Similarity for Sentences in Dataframe & Cosine similarity of rows in pandas DataFrame however they don't answer my Query. Any pointers would be helpful.