2

I need to train a neural network to classify some text documents into a boolean class (NN has one output with "Yes" or "No" values).
Is there any algorithm to find best input parameters (for example presence of words, term, sentence and/or frequency/repetition of a word & ...) ?
If not can you give me a starting point to find these parameters(How should I select them)?

Thanks

Ariyan
  • 14,760
  • 31
  • 112
  • 175
  • 1
    ideally, all of the above! If you can give the neural network words, terms, word n-grams, character n-grams, etc, then you're giving it more inputs from which to make decisions. It probably doesn't make sense to give the NN sentences or anything larger than a term. However, the more inputs you have, the slower your algorithm will run, so you have to tweak it until you get results that you're satisfied with. There is no magic bullet for this, although you could build another AI algorithm to provide the NN with different inputs, but you may end up with the same problem for that AI algo too. – Kiril Nov 18 '11 at 19:37
  • @Lirik: I don't mean giving a sentence to NN. I mean giving boolean inputs that shows presence or absence of a term,... and/or the count of repetition of a word,... and I'm not thinking on more than 10-15 inputs – Ariyan Nov 18 '11 at 19:57
  • 1
    my answer won't change much... there is no algorithm that can optimize that for you (unless you build another AI algorithm to do it), so you should try to tweak the NN with all of the things that could possibly increase the accuracy. It's a tedious process and I don't know a way around it. – Kiril Nov 18 '11 at 20:28

1 Answers1

1

The standard approach I know of would be to use a vector of words/terms and assign them a negative or positive score using a learning or statistical algorithm. even perceptron learning should suffice, you just need a good set of positive and negative examples.

To my knoledge this the way all spam filter work. and they work pretty well.

WeaselFox
  • 7,220
  • 8
  • 44
  • 75