4

I have a few million Entities with 1 to 10 attributes describing each of them and about one hundred thousand Classes to sort them into.

Are there any Machine Learning algorithms (ideally available on SQL Server, Azure or as .NET library) or a stand-alone tools for massive Multiclass Classification capable of suggesting the top few best matching Classes for each of the Entities?

I have found this research along the lines: Learning compact class codes for fast inference in large multi class classification, but could not find any implementations.

At the moment I have sort of a K-nearest neighbours based on Full-Text Search with a couple of other dimensions weighted at 1/3 each to improve the results. I am looking for the ways to improve both performance and accuracy.

Y.B.
  • 3,526
  • 14
  • 24
  • To those voting to put the question on hold as off-topic: **I believe that in the world of machine learning a discussion of algorithms suitability for a particular scenario is no more opinionated then a discussion of [How can I check if one string contains another substring?](http://stackoverflow.com/questions/1789945/how-can-i-check-if-one-string-contains-another-substring) in JavaScript.** – Y.B. May 24 '16 at 10:58

1 Answers1

2

Have you tried ensemble learning? It's all about building multiple "weak" multiclass classifiers and finding a consensus through majority voting. The main advantage is because you can randomly select samples of you dataset and each classifier can learn from different sets. You can also try deep learning with Convolutional Neural Networks implemented with TensorFlow or Theano (I would recommend the last one). If you have a GPU you can make use of its processing capability to improve the training step. This code here https://github.com/attardi/CNN_sentence uses GPU processing, theano library and multiclass classification (for NLP applications), but it's not in C# as you asked.

  • Thank you for the link. It was interesting reading. It does seem to address the performance and accuracy although I am not sure it can handle thousands of target Classes. Anyway, worth trying. – Y.B. May 23 '16 at 14:17
  • I have not tried ensemble learning because so far I was testing Azure machine learning only and could not find a single suitable algorithm. It can be that I should apply some sort of R processing to the results, but I have not investigated that yet. – Y.B. May 23 '16 at 14:22