0

I'm trying to use SVMLight to build a classifier to detect if a Noun Phrase(NP) is anaphoric or not. I have my features but I'm stuck at understanding the format of the input file, should I translate all my text to this format or I put only the NP that represent positive instance and negative instance. And is there any software that allow me to translate my file to this format.

<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info> 
<target> .=. +1 | -1 | 0 | <float>  //for positive instance should I put +1
<feature> .=. <integer> | "qid" //should I do this line for all my feature
<value> .=. <float>
<info> .=. <string>  //Should this contain the NP

Also, for the model file what should this file contain exactly?

Your help would be very much appreciated.

AbirH
  • 63
  • 1
  • 1
  • 9

1 Answers1

0

Quoting the Cornell's official documentation for the usage of SVMlight, here is an example of the input format:

-1 1:0.43 3:0.12 9284:0.2

As far as what I understood, this means that in a document with "features" (say NP in your case), the above line represents the negative case with feature1 having a weight of 0.43, 3rd feature having a weight of 0.12, 9284th feature having 0.2 value and all other features have 0 value.

About the software or some source code or library to generate this kind of format - this is what I am looking for too and hence I am unable to answer it for you. But i hope you're clear about the format explanation.

Nandadeep
  • 23
  • 1
  • 4