2

I am using R to try to get each user's top 5 favourite songs by which songs they play the most. I currently have code which counts the highest played song but I was wondering how to get the next 4 highest played songs for that user, presuming every user has played at least 5 songs. Would I have to eliminate the highest values from the dataset and run it again or is there an easier way?

write.csv(group_by(mydata,userId) %.%
summarise(favourite=max(playCount)), file="test.csv")

An example of the data looks like this

userId      songId            playCount
A           568r              85
A           711g              18
C           34n               18
E           454j              65
D           663a              72
B           35d               84
A           34c               72
A           982s              65
E           433f              11
A           565t              7
Cormac
  • 39
  • 1
  • 6
  • Related: [*Fastest way to find second (third…) highest/lowest value in vector or column*](http://stackoverflow.com/questions/2453326/fastest-way-to-find-second-third-highest-lowest-value-in-vector-or-column/) – Blue Magister Mar 05 '14 at 22:12

2 Answers2

4

You can use:

rev(sort(x))[1:n]

to get the top n values of a vector. If you wanted the top n unique values, just add a call to unique()

rev(sort(unique(x)))[1:n]
Christopher Louden
  • 7,540
  • 2
  • 26
  • 29
  • 2
    It might speed things up a little bit to use the `partial` argument to `sort.int` (which `sort` calls). Also instead of using `rev` you could set `decreasing=TRUE`, or just use `tail`. – Greg Snow Mar 05 '14 at 22:04
2

Another way...

library(dplyr)

mydata2 <- group_by(mydata, userId) %.%
              arrange(userId, -playCount) %.%
              mutate(rank = rank(-playCount)) %.%

              # remove `rank > 1` if you want to keep the first song
              filter(rank > 1, rank < 6) %.%

              select(userId, songId, playCount)
maloneypatr
  • 3,562
  • 4
  • 23
  • 33