compared to a kick and a snare for example, a beatboxed input is likely to have less difference between the individual sounds and therefore likely to be harder for Ableton to individually extract the seperate sounds (depends on the beatboxer). In any case, some combination of frequency and amplitude - more specifically(Attack, Decay, Sustain, Release) as well as perhaps the different overtone combinations that account for differences in timbre are going to be the characteristics that would have to be evaluated in order to separate the kick snare and hihat .
Before this feature existed I used gates and hi/low pass filters to accomplish a similar task. So perhaps Ableton's solution is not as complicated as we might imagine.