Check the commonality of the two sentences

Question

To check the commonality of the two sentences, I used the model text-embedding-ada-002 of azure openai . However, it is not very accurate with negative sentences and antonyms. Example 2 sentences: I hate eating candy and I like eating candy, the similarity is 0.927 . Does this mean I'm using the wrong model or is there something I need to adjust?

Below is the python code to find the common point of 2 sentences

resp = openai.Embedding.create(
    input=[dict["text1"], dict["text2"]],
    engine="solize-dokushokai-openai-embeddings")

    embedding_a = resp['data'][0]['embedding']
    embedding_b = resp['data'][1]['embedding']

    similarity_score = np.dot(embedding_a, embedding_b)

Maybe this would go in GenAI stackexchange? – Coder Gautam YT Aug 05 '23 at 02:47 — Coder Gautam YT, Aug 05 '23 at 02:47

score 0 · Answer 1 · answered Aug 05 '23 at 03:14

I think this may be an issue with the model's limitations. Specially with the capture of negations and antonyms. It may be a good idea to use a different model that is known to perform well on semantic similarity tasks.

Here are some options:

Universal Sentence Encoder (USE): Developed by Google
BERT (Bidirectional Encoder Representations from Transformers)
RoBERTa: RoBERTa is another variant of BERT that further refines the training process
Sentence-BERT (SBERT): Sentence-BERT is an extension of BERT, specifically designed for computing sentence embeddings.

Good luck!

Check the commonality of the two sentences

1 Answers1