0

I have a dataframe where the samples are rows and columns are proteins, I have an additional categorical column which are the group names that the samples belong to :

Sample Group protein1 protein2
s1 group1 2.5 0.1
s2 group2 0.2 3.0

the number of samples in each group is different, so I would like to randomly sample based on the minimum number of samples in say group1 and then make 1 dataframe out of it, then use mclust to cluster the data. I would like to repeat this process till all samples have been used and for a fixed number of iterations say 10. And finally have a table where in I have the samples that were selected for clustering using mclust and the optimal K that was found using those samples. At every random sampling process, I want to have samples from each group.

mclust optimal k iteration number of samples from group1 number of samples from group1
2 1 5 5
2 2 5 6
n ..10 5 7

i would be happy to recieve any help :) Thanks a lot

  • Share reproducible data, and your coding attempts. See related, possible duplicate post: https://stackoverflow.com/q/18258690/680068 – zx8754 Feb 02 '23 at 09:09

0 Answers0