how to find the semantic meaning similarity of two string in python

Question

rather than finding the similarity between two string ,i just want find the similarity of the meaning of the two strings for ex.

what are the types of hyper threading
is there any categoriesin hyper threading

should have similarity .Till now i tried cosine similarity and word mover distance but i am not getting accurate result for some of the strings

If you want semantic similarity accurately, you probably need some pre-trained models. Or you can try ConceptNet Numberbatch like explained here: https://stackoverflow.com/a/53407328/5619835 Also that title can give you idea. — mulaixi, Oct 21 '19 at 10:33
Welcome to SO, which is about *specific coding* questions; your question is way too broad, please do take some time to read [How to Ask](https://stackoverflow.com/help/how-to-ask) and [What topics can I ask about here?](https://stackoverflow.com/help/on-topic). — desertnaut, Oct 21 '19 at 10:38

Nikaido · Accepted Answer · 2019-10-21T12:10:07.657

It is something that is really hard to do. It is also difficult to know that do you mean as "accurate" semantic similarity between two phrases. You need to find a "good" metric to do so

Anyway, if you a have a limited context (you don't have to do a general purpose semantic similiraty calculator) a very basic approach could be to build a text classifier (with machine learning), in which you define the principal classes that you want to use.

For example, for your example phrases, you could have the two text classes:

asking about hyperthreading
asking about food

Than you train your model with a lot of phrases and your model output probabilities for your example phrases as such:

"what are the types of hyper threading":
- asking about hyperthreading 0.9
- asking about food 0.5
"is there any categories in hyper threading"
- asking about hyperthreading 0.8
- asking about food 0.4

Both phrases are classified as "asking about hyperthreading" (because they have the higher score in these classes) and then one can assume that they are similar. One could use also the probabilities scores to do something more sofisticated (using score differences etc)

how to find the semantic meaning similarity of two string in python

1 Answers1