1

I have experience dealing with Neural Networks, specifically ones of the Back-Propagating nature, and I know that of the inputs passed to the trainer, dependencies between inputs are part of the resulting models knowledge when a hidden layer is introduced.

Is the same true for decision networks?

I have found that information around these algorithms (ID3) etc somewhat hard to find. I have been able to find the actual algorithms, but information such as expected/optimal dataset formats and other overviews are rare.

Thanks.

Patrick McCurley
  • 2,006
  • 1
  • 14
  • 24

1 Answers1

0

Decision Trees are actually very easy to provide data to because all they need is a table of data, and which column out of that data what feature (or column) you want to predict on. That data can be discrete or continuous for any feature. Now there are several flavors of decision trees with different support for continuous and discrete values. And they work differently so understanding how each one works can be challenging.

Different decision tree algorithms with comparison of complexity or performance

Depending on the type of algorithm you are interested in it can be hard to find information without reading the actual papers if you want to try and implement it. I've implemented the CART algorithm, and the only option for that was to find the original 200 page book about it. Most of other treatments only discuss ideas like splitting with enough detail, but fail to discuss any other aspect at more than a high level.

As for if they take into account the dependencies between things. I believe it only assumes dependence between each input feature and the prediction feature. If the input was independent from the prediction feature you couldn't use it as a split criteria. But, between other input features I believe they must be independent of each other. I'd have to check the book to ensure that was true or not, but off the top of my head I think that's true.

Community
  • 1
  • 1
chubbsondubs
  • 37,646
  • 24
  • 106
  • 138
  • Thanks for the answer. I am thinking one approach could be to modify the algorithm to take the relationships between input values, and use the calculations as additional input values. This would essentially square the number of inputs being fed into the trees, but may perhaps expose some additional association between the datapoints and the output. – Patrick McCurley May 03 '12 at 09:38
  • One thing about the CART algorithm that is unique is its a binary tree. A split always splits into two branches. But it can choose to split along multiple values and multiple features with surrogate splits. You might want to research surrogate splitting in CART because that splits along several features at once. Squaring the inputs would be prohibitively expensive because CART already has a O(n!) algorithm when testing a split on categorical features that has to be optimized. And squaring implies that only between into account 2 features what if the relationships are among 4, 5, or 6?... – chubbsondubs May 03 '12 at 14:35
  • I think ultimately you are trying to find the dependence relationship between features which I think is covered using surrogate splits. Out of a relationship of two features you'll find one of those splits dominates the other feature, but you can take into account a threshold of how much that 2nd, 3rd, etc contribute to the split. – chubbsondubs May 03 '12 at 14:40