I have read through similar questions, but mine is slightly different. I have a dataframe (df1) with more than 3 million rows, 1874 species (scientific_name) and total value.
I also have another dataframe (df2) which provides the number of rows I want to keep per species (in total around 2 million rows).
What I would like to do is subset/filter df1 as per the number of rows specified in df2, keeping only those rows with the highest total value. E.g. let imagine that Cypraeidae in df2 n.at.70 = 1104 (rather than 1), so I would like the resulting df to retain 1104 rows starting with the highest total value to the 1104th highest total value for that species (scientific_name).
I have been unable to achieve this for one species, let alone come up with an effective 'apply' or 'for' loop, so any help would be greatly appreciated, I am relatively new to R.