As a follow-up question to this one: Remove duplicated rows using dplyr, I have the following:
How do you randomly remove duplicated rows using dplyr() (among others)?
My command now is:
data.uniques <- distinct(data, KEYVARIABLE, .keep_all = TRUE)
But it returns the first occurrence of the KEYVARIABLE. I want that behaviour to be random: so anywhere between 1
and n
occurrences of that KEYVARIABLE.
For instance:
KEYVARIABLE BMI
1 24.2
2 25.3
2 23.2
3 18.9
4 19
4 20.1
5 23.0
Currently my command returns:
KEYVARIABLE BMI
1 24.2
2 25.3
3 18.9
4 19
5 23.0
I want it to randomly return one of the n
duplicated rows, for instance:
KEYVARIABLE BMI
1 24.2
2 23.2
3 18.9
4 19
5 23.0