Count 5 highest values of a variable

Question

I am using R to try to get each user's top 5 favourite songs by which songs they play the most. I currently have code which counts the highest played song but I was wondering how to get the next 4 highest played songs for that user, presuming every user has played at least 5 songs. Would I have to eliminate the highest values from the dataset and run it again or is there an easier way?

write.csv(group_by(mydata,userId) %.%
summarise(favourite=max(playCount)), file="test.csv")

An example of the data looks like this

userId      songId            playCount
A           568r              85
A           711g              18
C           34n               18
E           454j              65
D           663a              72
B           35d               84
A           34c               72
A           982s              65
E           433f              11
A           565t              7

Related: [*Fastest way to find second (third…) highest/lowest value in vector or column*](http://stackoverflow.com/questions/2453326/fastest-way-to-find-second-third-highest-lowest-value-in-vector-or-column/) — Blue Magister, Mar 05 '14 at 22:12

Christopher Louden · Answer 1 · 2014-03-05T21:44:16.617

4

You can use:

rev(sort(x))[1:n]

to get the top n values of a vector. If you wanted the top n unique values, just add a call to unique()

rev(sort(unique(x)))[1:n]

edited Mar 05 '14 at 21:44

answered Mar 05 '14 at 21:35

Christopher Louden

7,540
2
26
29

2

It might speed things up a little bit to use the `partial` argument to `sort.int` (which `sort` calls). Also instead of using `rev` you could set `decreasing=TRUE`, or just use `tail`. – Greg Snow Mar 05 '14 at 22:04

maloneypatr · Answer 2 · 2014-03-05T22:15:38.717

2

Another way...

library(dplyr)

mydata2 <- group_by(mydata, userId) %.%
              arrange(userId, -playCount) %.%
              mutate(rank = rank(-playCount)) %.%

              # remove `rank > 1` if you want to keep the first song
              filter(rank > 1, rank < 6) %.%

              select(userId, songId, playCount)

edited Mar 05 '14 at 22:15

answered Mar 05 '14 at 22:10

maloneypatr

3,562
4
23
33

Count 5 highest values of a variable

2 Answers2

Linked

Related