1

I am using the Svmlight package in python to train a SVM rank model. However, I cannot figure out a way to pass the training data to the learn function. My python source code is as follows:

import svmlight

trainingDat = open('train.dat','r')
model = svmlight.learn(trainingDat, type='ranking')

The data file (train.dat) looks like this:

# query 1
3 qid:1 1:1 2:1 3:0 4:0.2 5:0
2 qid:1 1:0 2:0 3:1 4:0.1 5:1
1 qid:1 1:0 2:1 3:0 4:0.4 5:0
1 qid:1 1:0 2:0 3:1 4:0.3 5:0
# query 2
1 qid:2 1:0 2:0 3:1 4:0.2 5:0
2 qid:2 1:1 2:0 3:1 4:0.4 5:0
1 qid:2 1:0 2:0 3:1 4:0.1 5:0
1 qid:2 1:0 2:0 3:1 4:0.2 5:0
# query 3
2 qid:3 1:0 2:0 3:1 4:0.1 5:1
3 qid:3 1:1 2:1 3:0 4:0.3 5:0
4 qid:3 1:1 2:0 3:0 4:0.4 5:1
1 qid:3 1:0 2:1 3:1 4:0.5 5:0

I get the following error on running the code:

TypeError: document should be a tuple

I looked for similar questions and found one: Load svmlight format error

The answer in this link suggests to implement a parser that reads from the data file provided above and convert it to a tuple of features and target. However, when it comes to training a ranker, we need to provide information about the set that an instance belongs to (theoretically).

My question: How to pass training data to the svm learn method when using the ranking configuration?

Thank you in advance!!

Community
  • 1
  • 1
Sarin
  • 197
  • 3
  • 13

1 Answers1

2

The training data should be passed as a list of triplets, in the following format:

(<label>, [(<feature>, <value>), ...], <queryid>)

Source: https://pypi.python.org/pypi/svmlight

I had to write a parser similar to the one mentioned in Load svmlight format error, to convert the data from the SVMLight file to the format mentioned above.

Hope this helps!!

Community
  • 1
  • 1
Sarin
  • 197
  • 3
  • 13