I'm attempting to classify some inputs (text classification: 10,000+ examples, and 100,000+ features)
And I've read that using LibLinear is far faster / more memory efficient for such tasks, as such, I've ported my LibSvm classifier to accord/net, like so:
//SVM Settings
var teacher = new MulticlassSupportVectorLearning<Linear, Sparse<double>>()
{
//Using LIBLINEAR's L2-loss SVC dual for each SVM
Learner = (p) => new LinearDualCoordinateDescent<Linear, Sparse<double>>()
{
Loss = Loss.L2,
Complexity = 1,
}
};
var inputs = allTerms.Select(t => new Sparse<double>(t.Sentence.Select(s => s.Index).ToArray(), t.Sentence.Select(s => (double)s.Value).ToArray())).ToArray();
var classes = allTerms.Select(t => t.Class).ToArray();
//Train the model
var model = teacher.Learn(inputs, classes);
At the point of .Learn()
- I get an instant OutOfMemoryExcpetion
.
I've seen there's a CacheSize
setting in the documentation, however, I cannot find where I can lower this setting, as is show in many examples.
One possible reason - I'm using the 'Hash trick' instead of indices - is Accord.Net attempting to allocate an array of the full hash space? (probably close to int.MaxValue) if so - is there any way to avoid this?
Any help is most appreciated!