Classifying text into different classes depending on similarity

Question

I am working on very large documents {NEWS + Articles} using modeling Natural Sentences into classes, please look at the following example:

1- The System enables a user to shut down the server remotely ==> class 1

2- The Application allows a customer to to close the machine online ==> (must be also) class 1 , why ?

because both sentences have many similar synonyms {System ~Application,enables ~ allows ,user ~ customer ,shut down ~ close,server ~ machine,remotely~online} so I am doing classifier train on some data depending on the similarity rules or synonyms of the words + stemming + may be (lemmatization) the most number of rules the most result we can get.

so the question what is the best strategy to configure/adjust the classifier to that ideas ? Thank you in advance

check out: radimrehurek.com/gensim/models/doc2vec.html – alvas Aug 30 '15 at 08:00 — alvas, Aug 30 '15 at 08:00

score 0 · Answer 1 · edited May 23 '17 at 11:58

0

Have you taken a look at this ??

Is there an algorithm that tells the semantic similarity of two phrases

The most important is to determine similarity means. If you do that, choosing a classifier is the easy part of the task (ID3, C4.5, bag-of-words, naive bayes, etc.).

edited May 23 '17 at 11:58

Community

1
1

answered Sep 01 '15 at 08:35

rpd

462
1
9
24

Classifying text into different classes depending on similarity

1 Answers1