Random sampling of rows based on a catergorical column for n iterations in R then using mclust for clustering these dataframes

Question

I have a dataframe where the samples are rows and columns are proteins, I have an additional categorical column which are the group names that the samples belong to :

Sample	Group	protein1	protein2
s1	group1	2.5	0.1
s2	group2	0.2	3.0

the number of samples in each group is different, so I would like to randomly sample based on the minimum number of samples in say group1 and then make 1 dataframe out of it, then use mclust to cluster the data. I would like to repeat this process till all samples have been used and for a fixed number of iterations say 10. And finally have a table where in I have the samples that were selected for clustering using mclust and the optimal K that was found using those samples. At every random sampling process, I want to have samples from each group.

mclust optimal k	iteration	number of samples from group1	number of samples from group1
2	1	5	5
2	2	5	6
n	..10	5	7

i would be happy to recieve any help :) Thanks a lot

Share reproducible data, and your coding attempts. See related, possible duplicate post: https://stackoverflow.com/q/18258690/680068 — zx8754, Feb 02 '23 at 09:09

Random sampling of rows based on a catergorical column for n iterations in R then using mclust for clustering these dataframes

0 Answers0