2

Imagine that the universe of all known mappings between the values of a set of variables V and a set of tag names T (classification labels) was known. Further, assume that the total space of unique variable value combinations is large (> 100B points), the size of the tag set is relatively small (thousands of elements) and the number of variables is very small (4-10).

What is a algorithm for building a classifier function that provides a perfect mapping (matching the a priori knowledge with no false positives or false negatives) from variable values to labels with the following space and time complexity goals:

  • Time complexity lower than O(|V|*log|T|)
  • Space complexity less than O(|V|k), k ≤ e

Or, rephrased as a decision tree problem:

  1. How can a decision tree algorithm be tuned to create a perfect mapping?
  2. How can the training data be represented efficiently to guarantee that?
Sim
  • 13,147
  • 9
  • 66
  • 95
  • You haven't defined what a "perfect" mapping is. – phs Apr 04 '13 at 05:58
  • How is the prior knowledge represented? How are you able to say to which class any instance belongs, without enumerating all possibilities? It seems to me that *this* is the classification rule you should be using, not trying to get a decision tree to match this prior knowledge. Perhaps if you explain the context? – Ben Allison Apr 04 '13 at 15:08
  • @BenAllison very good question. The "class" space is generated by the output of the following question: http://stackoverflow.com/questions/15803808/computing-unique-intersections-of-subsets – Sim Apr 04 '13 at 16:09

1 Answers1

4

What you are trying to achieve should be possible with any decision tree classifier that allows you to specify the level of pruning in some way. The idea is to make it not do any pruning at all. The decision tree you would end up with would have (potentially) one leaf per training instance (i.e. very large) but would give you "perfect" accuracy with a prediction time O(|V|*log|T|).

This is completely independent of how the training data is represented (and should be). The only thing that matters is that the decision tree inducer can read and process it. One simply way of building such a tree is to add a path for the first example, then merge in one for the second and so on.

Whether such a classifier would be useful in practice is a completely different question of course -- in most cases it won't be.

Lars Kotthoff
  • 107,425
  • 16
  • 204
  • 204
  • Right, hence my somewhat fuzzy definition of "good" algorithm. I will specify tighter bounds on space complexity and the number of leaf nodes in the problem description. Clearly, what you describe as the native algorithm will both be impractical to compute given the size of the problems we are dealing with and impractical to use. – Sim Apr 04 '13 at 16:00
  • If you want efficient classification, why not simply hash the feature values for all instances and associate with the value you want to predict? – Lars Kotthoff Apr 04 '13 at 16:02
  • Note the size of the problem is > 100B points. Mapping changes frequently due to changes in the data set. Not practical to cache. – Sim Apr 04 '13 at 16:07
  • If you want perfect classification you won't be able to do much better in terms of size than storing the entire data set in any case. Unless you have a specific case where everything simplifies nicely. – Lars Kotthoff Apr 04 '13 at 16:11