I am working on very large documents {NEWS + Articles} using modeling Natural Sentences into classes, please look at the following example:
1- The System enables a user to shut down the server remotely ==> class 1
2- The Application allows a customer to to close the machine online ==> (must be also) class 1 , why ?
because both sentences have many similar synonyms {System ~Application,enables ~ allows ,user ~ customer ,shut down ~ close,server ~ machine,remotely~online} so I am doing classifier train on some data depending on the similarity rules or synonyms of the words + stemming + may be (lemmatization) the most number of rules the most result we can get.
so the question what is the best strategy to configure/adjust the classifier to that ideas ? Thank you in advance