-1

rather than finding the similarity between two string ,i just want find the similarity of the meaning of the two strings for ex.

  1. what are the types of hyper threading
  2. is there any categoriesin hyper threading

should have similarity .Till now i tried cosine similarity and word mover distance but i am not getting accurate result for some of the strings

  • If you want semantic similarity accurately, you probably need some pre-trained models. Or you can try ConceptNet Numberbatch like explained here: https://stackoverflow.com/a/53407328/5619835 Also that title can give you idea. – mulaixi Oct 21 '19 at 10:33
  • Welcome to SO, which is about *specific coding* questions; your question is way too broad, please do take some time to read [How to Ask](https://stackoverflow.com/help/how-to-ask) and [What topics can I ask about here?](https://stackoverflow.com/help/on-topic). – desertnaut Oct 21 '19 at 10:38

1 Answers1

0

It is something that is really hard to do. It is also difficult to know that do you mean as "accurate" semantic similarity between two phrases. You need to find a "good" metric to do so

Anyway, if you a have a limited context (you don't have to do a general purpose semantic similiraty calculator) a very basic approach could be to build a text classifier (with machine learning), in which you define the principal classes that you want to use.

For example, for your example phrases, you could have the two text classes:

  1. asking about hyperthreading

  2. asking about food

Than you train your model with a lot of phrases and your model output probabilities for your example phrases as such:

  1. "what are the types of hyper threading":

    • asking about hyperthreading 0.9

    • asking about food 0.5

  2. "is there any categories in hyper threading"

    • asking about hyperthreading 0.8

    • asking about food 0.4

Both phrases are classified as "asking about hyperthreading" (because they have the higher score in these classes) and then one can assume that they are similar. One could use also the probabilities scores to do something more sofisticated (using score differences etc)

Nikaido
  • 4,443
  • 5
  • 30
  • 47