6

I have 2 texts as below

Text1 : John likes apple

Text2 : Mike hates orange

If you check above 2 texts, both of them are similar syntactically but semantically have a different meaning.

I want to find

1) Syntactic distance between 2 texts

2) Semantic distance between 2 texts

Is there any way to do this using nltk, as I am newbie to NLP?

Ganesh Deshvini
  • 429
  • 3
  • 17

2 Answers2

4

Yes, But not limited to nltk. One way that use for syntactic distance, is Part Of Speech tagging(POS Tagging) that map each word of sentence to a specific tag: https://en.wikipedia.org/wiki/Part-of-speech_tagging

For example it map your sentences to these:
Text1: Noun Verb Noun
Text2: Noun Verb Noun

Then you can measure the distance of these two sentences.


And for semantic, you need semantic word net and find synonyms for each word of the sentence, then try to find the intersection of synonyms of words in each sentence

Masoud
  • 1,343
  • 8
  • 25
  • This is a good answer. Perhaps you could recommend OP methods of comparison for the 1st case, and a particular word net or resource? I'm sure future readers will be interested too – salezica Aug 17 '16 at 00:16
  • Thanx @Masoud for providing the direction, just have a couple of questions, Do we have any built-in library which calculates the SYNTACTIC distance in nltk? If not then how to measure the distance for the same? any reference/resource you can provide? – Ganesh Deshvini Aug 17 '16 at 09:04
3

For the semantic, you might want to try word2vec. You can safely average the similarity of words within the sentence or you can come up with your own way to weigh the words according to its syntax.

from gensim.models import Word2Vec

model = Word2Vec.load(path/to/your/model)

model.similarity('apple', 'orange')
aerin
  • 20,607
  • 28
  • 102
  • 140