0

I have recently started working with the stanford-nlp API, for a given input(paragraph) it deconstructs it and gives the results.
My question is how to use this to compare 2 different paragraphs?
Are there any worked algorithms on the internet which I can refer to?
Any pointers on how to approach this would be greatly appreciated.

Thank you!

vberg
  • 1
  • 2

1 Answers1

0

This is a very broad question. What do you mean by compare two paragraphs? You could actually "compare" two paragraphs using a string edit distance function without doing any parsing. See: https://en.wikipedia.org/wiki/Edit_distance

Going one step further, I used a shallow approach where I only considered POS tags and words, you can read more in my MS thesis here, starting page 19: http://josep.valls.name/wordpress/wp-content/uploads/2011/09/MCVAI-JosepVallsVargas-0905.pdf

If you want to use the full syntactic or dependency parse, you will need to dive into the world graph similarity. Read more here: https://en.wikipedia.org/wiki/Graph_theory

Finally, one of the latest trends in the pharaphrase identification community is to use word2vec which is a tool released by Google to compute word embeddings. You may want to read through the responses of this SO question: How to calculate the sentence similarity using word2vec model of gensim with python

Community
  • 1
  • 1
Josep Valls
  • 5,483
  • 2
  • 33
  • 67
  • I just ran into another question that may be of your interest http://stackoverflow.com/questions/62328/is-there-an-algorithm-that-tells-the-semantic-similarity-of-two-phrases – Josep Valls Jan 20 '16 at 05:47