3

Sample data

mysample <- data.frame(ID = 1:100, kWh = rnorm(100))

I'm trying to automate the process of returning the rows in a data frame that contain the 5 highest values in a certain column. In the sample data, the 5 highest values in the "kWh" column can be found using the code:

(tail(sort(mysample$kWh), 5))

which in my case returns:

[1] 1.477391 1.765312 1.778396 2.686136 2.710494

I would like to create a table that contains rows that contain these numbers in column 2. I am attempting to use this code:

mysample[mysample$kWh == (tail(sort(mysample$kWh), 5)),]

This returns:

   ID      kWh  
87 87 1.765312

I would like it to return the r rows that contain the figures above in the "kWh" column. I'm sure I've missed something basic but I can't figure it out.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Volcanic
  • 87
  • 6

1 Answers1

6

We can use rank

mysample$Rank <- rank(-mysample$kWh)
head(mysample[order(mysample$Rank),],5)

if we don't need to create column, directly use order (as @Jaap mentioned in three alternative methods)

#order descending and get the first 5 rows
head(mysample[order(-mysample$kWh),],5)
#order ascending and get the last 5 rows
tail(mysample[order(mysample$kWh),],5) 
#or just use sequence as index to get the rows.
mysample[order(-mysample$kWh),][1:5] 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    why not just `head(mysample[order(-mysample$kWh),],5)`? – Jaap Feb 01 '16 at 15:48
  • @Jaap Yes, it is possible, but I thought I read something like to create a new column or so. – akrun Feb 01 '16 at 15:49
  • @Jaap I used tail instead of head after reading this question http://stackoverflow.com/questions/3692563/how-to-return-5-topmost-values-from-vector-in-r – Volcanic Feb 01 '16 at 15:51
  • I've tested akrun's answer and Jaap's, both work with my sample data. I like Jaap's because it's neater and doesn't add a column to the data. However, it doesn't work with my actual data, so I'm going to have to do some more investigating. The question as asked was answered, many thanks. – Volcanic Feb 01 '16 at 16:07
  • @Volcanic i added column as i thought you may need it – akrun Feb 01 '16 at 16:08
  • 1
    @akrun maybe you can add these alternatives as well: `tail(mysample[order(mysample$kWh),],5)` & `mysample[order(-mysample$kWh),][1:5]` – Jaap Feb 01 '16 at 16:14
  • 2
    simple enough for my needs: `mysample[mysample$Rank<6,]` @akrun, the rank column was needed after all. This works with my real data. Very happy. – Volcanic Feb 01 '16 at 16:19