0

I have a data set of stock data ordered by date and then ordered by magnitude of return from largest to smallest on that date. Each day has about 800 stocks in it.

How would I create a new data frame with only the top 10 stocks from each day with the largest return?

So I need the top 10 from each date, the others I don't care about.

2 Answers2

2

dplyr is your friend. something like:

new_df <- df %>% group_by(date) %>% top_n(10, stock)
goodtimeslim
  • 880
  • 7
  • 13
2

data.table has a quick way to do this too.

library(data.table)

# here are 20 random values on each of two dates
fake_data <-
  data.table(date=rep(Sys.Date()-1:2, each=20), 
             stock=rnorm(20*2))

# subset data by date, then order the SD by stock 
# return (descending) and take first 10 rows
fake_data[, .SD[order(-stock)][1:10,], by=date]
arvi1000
  • 9,393
  • 2
  • 42
  • 52
  • Also `fake_data[order(date, -stock), head(.SD, 10L), by=date]` although this might not always preserve the order. – Arun Apr 30 '15 at 06:36