I have data frame [Data frame examples], which has 113 entries X 54748 total columns. The column headers look like this:
"SampleID" "metadata_1" "metadata_2" "metadata_3" "Gene_1" "Gene_2" ... "Gene_54748"
The goal is to randomly split the data frame by the "Gene_XXX" columns into 10 smallest data frames. Every new subsetted data frame, must have the same 4 initial columns i.e ["Sampleid" "metadata_1" "metadata_2" "metadata_3"], plus a combination of randomly selected "Gene_XXX" columns, almost equally distributed in number across the 10 subsets.
Example output:
Subset 1:
"SampleID" "metadata_1" "metadata_2" "metadata_3" "Gene_3" "Gene_8" "Gene_4"... "Gene 5474"
Subset 2:
"SampleID" "metadata_1" "metadata_2" "metadata_3" "Gene_1" "Gene_6" "Gene_5"... "Gene 5470"
......
Subset_10:
"SampleID" "metadata_1" "metadata_2" "metadata_3" "Gene_2" "Gene_7" "Gene_9"... "Gene 5472"
So the initial "Genes" will be all present uniquely in the 10 subsets and also randomly distributed (not in order of appearance or alphabetically).
Any idea on how to perform this?
Thank you in advance for any feedback!