This is the basically same problem I had in Excel a few days ago (Excel - find nth largest value based on criteria), but this time in R (the data set contains half a million entries and that is more than Excel seems to be able to handle).
I have a table that looks like this that I have imported from Excel:
Country Region Code Product name Year Value
Sweden Stockholm 123 Apple 1991 244
Sweden Kirruna 123 Apple 1987 100
Japan Kyoto 543 Pie 1987 544
Denmark Copenhagen 123 Apple 1998 787
Denmark Copenhagen 123 Apple 1987 100
Denmark Copenhagen 543 Pie 1991 320
Denmark Copenhagen 126 Candy 1999 200
Sweden Gothenburg 126 Candy 2013 300
Sweden Gothenburg 157 Tomato 1987 150
Sweden Stockholm 125 Juice 1987 250
Sweden Kirruna 187 Banana 1998 310
Japan Kyoto 198 Ham 1987 157
Japan Kyoto 125 Juice 1987 550
Japan Tokyo 125 Juice 1991 100
What I want to do is to make a code that can give me the sum of the nth largest value
of products
that have been sold in a specific country
. For instance, the most sold product
in Sweden is Apple
so I want to code to find that apple
is the most sold product
(in total, which is what I am interested in) and then summaries all the values
of the sold apples
in the country
Sweden, 344
.
I also want to be able to find the nth largest value based on both country
and year
. That is, if I am looking for the most sold product
in Sweden in the year
2013, it should return the product
Candy and the value
300.