R Keeping the top 10 rows for each date in my data frame

Question

I have a data set of stock data ordered by date and then ordered by magnitude of return from largest to smallest on that date. Each day has about 800 stocks in it.

How would I create a new data frame with only the top 10 stocks from each day with the largest return?

So I need the top 10 from each date, the others I don't care about.

Please [read this on creating a reproducible example](http://stackoverflow.com/q/5963269/4002530) — tospig, Apr 30 '15 at 03:16

score 2 · Answer 1 · answered Apr 30 '15 at 03:27

2

dplyr is your friend. something like:

new_df <- df %>% group_by(date) %>% top_n(10, stock)

answered Apr 30 '15 at 03:27

goodtimeslim

880
7
13

Thanks. That's exactly what I wanted. I didn't know about the top_n() function. – blackhawks797 Apr 30 '15 at 03:31

score 2 · Answer 2 · answered Apr 30 '15 at 04:21

2

data.table has a quick way to do this too.

library(data.table)

# here are 20 random values on each of two dates
fake_data <-
  data.table(date=rep(Sys.Date()-1:2, each=20), 
             stock=rnorm(20*2))

# subset data by date, then order the SD by stock 
# return (descending) and take first 10 rows
fake_data[, .SD[order(-stock)][1:10,], by=date]

answered Apr 30 '15 at 04:21

arvi1000

9,393
2
42
52

Also `fake_data[order(date, -stock), head(.SD, 10L), by=date]` although this might not always preserve the order. – Arun Apr 30 '15 at 06:36

R Keeping the top 10 rows for each date in my data frame

2 Answers2

Linked

Related