Let's say that I have a pull (a list) of well known phrases, like: { "I love you", "Your mother is is a ...", "I think I am pregnant" ... } Let's say about a 1000 like these. And now I want the users to enter free text into a text box, and put some kind of NLP engine to digest the text and find the 10 most relevant phrases from the pull that may be related in a way to the text.
- I thought that the simplest implementation could be looking by the words. Picking each time one word and looking for similarities in some way. Not sure which?
- What most frightens me is the size of a vocabulary that I must support. I am a single developer of some kind of a demo, and I don't like the idea of filling in words into a table...
- I am looking for a free NLP engine. I am agnostic about the language it's written in, but it must be free - NOT some kind of an online service that charges by API calls..