I have a dataset that consists of 2 columns idunique
and match_no
Reproducible example here
idunique <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
match_no <- c(1, 1, 1, 1, 2, 2, 3, 3, 4, 5)
df <- data.frame(idunique, match_no)
idunique match_no
1 1
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 4
10 5
I need to randomly sample occurrences of match_no from the database and extract x amount of unique occurrences.
example output would a random subset of idunique
based on randomly sampled match_no
idunique match_no
1 1
5 2
7 3
9 4
10 5
The real database is 6 million rows long with ~ 2000 duplicates of each match_no
so I need the solution to be able to change the sample size.