I've been reading up on NLP as much as I can and searching on here but haven't found anything that seems to address exactly what I am trying to do. I am pretty new to NLP, only having had some minor exposure before, so far I have gotten the NLP processor I'm using working to where I am able to extract the POS from the text.
I am just working with a small sample document and then with one "input phrase" that I am basically trying to find a match for. The code I've written so far basically does this:
- takes the input phrase and the "searchee (document being searched on)" and breaks them down into Lists of individual words, then also gets the POS for each word. User also puts in one kewyord that is in the input phrase (and should be in doc being searched)
- both Lists are searched for the keyword that the user input, then, for the first place this keyword is found in each document, a set number of words before and after are taken (such as 5). These are put into a dataset for processing, so if one article had:
keyword: football
"A lot of sports are fun, football is a great, yet very physical sport." - Then my process would truncate this down to "are fun, football is a"
My goal is to compare the pieces, such as the "are fun, football is a" for similarity as far as if they are likely to be used in a similar context, etc.
I'm wondering if anyone can point me in the right direction as far as patterns that could be used for this, algorithms, etc. The example above is simplistic, just to give an idea, but I would be planning to make this more complex if I can find the right place to learn more about this. Thanks for any info