I have a dataset of 20 rows with 4 columns A,B,C,D. [simplified data set]
Original data set:
>data
ID Name Age Type
1 ABC 23 A
2 CDE 34 A
3 ABCE 23 C
4 CDEYU 34 B
5 ABCW 23 A
6 CDEDR 34 B
7 ASER 23 A
8 CDEAW 34 B
9 ABCHKJ 23 A
10 CDEFDE 34 C
11 ABCDDD 23 A
12 CDEDDD 34 A
13 ABCEDDD 23 C
14 CDEYUDDD 34 B
15 ABCWDDD 23 A
16 CDEDRDDD 34 B
17 ASERDDD 23 A
18 CDEAWDDD 34 B
19 ABCHKJDDD 23 A
20 CDEFDEDDD 34 C
Here the "Type" column is distributed in such a way that probabilities of A,B,C is (0.5, 0.3, 0.2) respectively.
Now, I want to cut two unique sets of 10 each, so that each set will have 10 rows with the same probability distribution.
Can I use the sample function to achieve this purpose?
Something like this:
sample(data, 10, replace=F, prob((data$Type="A")=0.5,(data$Type="B")=0.3,(data$Type="C")=0.2))
Also, how do I write a loop to get this continuously for a big set of 100 rows? I mean 10 sets from a dataset of 100 rows.
Expected Output:
Dataset 1:
ID Name Age Type
1 ABC 23 A
2 CDE 34 A
3 ABCE 23 C
4 CDEYU 34 B
5 ABCW 23 A
6 CDEDR 34 B
7 ASER 23 A
8 CDEAW 34 B
9 ABCHKJ 23 A
10 CDEFDE 34 C
Dataset 2:
ID Name Age Type
1 ABCDDD 23 A
2 CDEDDD 34 A
3 ABCEDDD 23 C
4 CDEYUDDD 34 B
5 ABCWDDD 23 A
6 CDEDRDDD 34 B
7 ASERDDD 23 A
8 CDEAWDDD 34 B
9 ABCHKJDDD 23 A
10 CDEFDEDDD 34 C
Any help in this regard would be greatly appreciated.