I tried to group students by their interests. The groups should have roughly the same size, even if this means that some students don't really share interests with their group members if they don't fit into any of the groups.
I used R's hclust()
function and got a really nice dendrogram - so that works perfectly - but when I try to set clusters using cutree()
, I can either adjust h
(the height of the tree) or k
(the desired group size). The problem is that even if I set my group size to a certain value, I get some groups that are way smaller.
If you look at the plotted tree, there are some students whose interests are completely different from those of the rest, so I guess that's the reason why it happens.
What I'd like to do to prevent this, is to "forbid" groups of a certain minimum size, so if there are such groups they are added to another small group or something like that. Is there an easy way to do this or do I have to write my own function to clean up a bit after the clustering?
I found similar questions on StackOverflow (e.g. this one) but they're all not flagged as answered and in the particular case I mentioned, I'm afraid I don't really get the proposed solution.
Thanks in advance for your input!
Merle