So I have a column (category) that contains either "Yes" or "No" in my df and in order to create a more balanced sample I want to select the rows with the first 500 cases of "Yes" and the first 500 cases of "No" from my dataset.
I've tried this code:
top_n(df,500, category=="Yes")
But this select ALL cases of yes instead of only the first 500 I also tried this but this gave me an error though I'm sure it makes no sense
df %>% filter(top_n(500, category == "Yes") & top_n(500, category=="No"))
I need a bit of help with the right direction