I have a dataframe in pandas that looks like this:
AID VID Freq
0 00016A3E 0127C661 1
1 00016A3E 0C05DA5D 2
2 00016A3E 0C032814 1
3 00016A3E 0BF6C78D 1
4 00016A3E 0A79DFF1 1
5 00016A3E 07BD2FB2 1
6 00016A3E 0790E61B 1
7 00016A3E 0C24ED25 3
8 00016A3E 073630B5 3
9 00016A3E 06613535 1
10 00016A3E 05F809AF 1
11 00016A3E 05C625FF 1
12 00016A3E 04220EA8 4
13 00016A3E 013A29E5 1
14 00016A3E 0761C98A 1
15 00016AE9 0A769475 16
16 00016AE9 0A7DED0A 2
17 00016AE9 0ABF60DF 9
18 00016AFF 0AE3F25A 2
19 00016AFF 0AEFE12F 5
20 00016AFF 0BD8975A 2
21 00016AFF 44DF880B 1
22 00016AFF 43F9E08E 2
23 00016AFF 44EA5E08 2
24 00016AFF 4539ED1E 16
25 00016AFF 8516B55A 4
26 00016AFF 0972AFF2 1
27 00016AFF 0C559B34 1
28 00016AFF 06B5C040 7
29 00016AFF 0B0426FA 1
I want to:
1- Extract all rows where the AID occurs 10 times or more, let's call it df1.
2- From that df1 I want to randomly select 2 rows for each AID and remove them from the df1 into another dataframe, let's call it df2.
So the resulting df's should look like this:
df1: We remove the rows where AID
is 00016AE9
, since there are only 3 occurrences of it.
AID VID Freq
0 00016A3E 0127C661 1
1 00016A3E 0C05DA5D 2
2 00016A3E 0C032814 1
4 00016A3E 0A79DFF1 1
5 00016A3E 07BD2FB2 1
7 00016A3E 0C24ED25 3
8 00016A3E 073630B5 3
9 00016A3E 06613535 1
10 00016A3E 05F809AF 1
11 00016A3E 05C625FF 1
12 00016A3E 04220EA8 4
13 00016A3E 013A29E5 1
14 00016A3E 0761C98A 1
18 00016AFF 0AE3F25A 2
19 00016AFF 0AEFE12F 5
20 00016AFF 0BD8975A 2
21 00016AFF 44DF880B 1
22 00016AFF 43F9E08E 2
23 00016AFF 44EA5E08 2
24 00016AFF 4539ED1E 16
25 00016AFF 8516B55A 4
26 00016AFF 0972AFF2 1
29 00016AFF 0B0426FA 1
df2: I want to randomly select two rows from each set of AIDs and put it in df2:
AID VID Freq
3 00016A3E 0BF6C78D 1
6 00016A3E 0790E61B 1
27 00016AFF 0C559B34 1
28 00016AFF 06B5C040 7