I'm having problem in in finding the right parameters for the information gain, if I don't have any discrete values and thus I first need to discretize these points into intervals.
What I have:
I'm doing image processing, where my features have a possible range 0-255. With some training data I can define some intervals (which only define "is object or is not object"). If goods
are the number of intervals for for a matching point and bads
is labeled for its environment. I'll calculate it this way with
information gain for this case:
where
Results and idea:
For some reason I end up with a negative IG which is quiet nonsense but I don't see the error. Another idea was instead of counting the object-matching intervals forgood
, count the samples in good
that fit into any good-interval.
Has anyone an idea?