I am trying to drop observations with prices in the top and bottom one percent, by year. I have been attempting to use dplyr
's group_by
function to group by year_sold
and then mutate()
to create a variable to_drop
whose value is conditional on the variable price
being between the 1st and 99th percentile. Here's what I have so far:
df <- df %>% dplyr::group_by(year_sold) %>%
mutate(to_drop = ifelse(price <= quantile(price,0.01) | price >= quantile(price,0.99),1,0))
However, I am not getting the quantiles of price
grouped by year_sold. Removing dplyr::group_by(year_sold) %>%
doesn't seem to change my results.
I'm trying to find alternatives to Stata's very useful bysort
command. Here's how I would have done this in Stata:
gen to_drop = 0
foreach y in year_sold {
quietly bysort `y': summarize price, detail // To get r(p1) and r(p99)
bysort `y': replace to_drop = 1 if ! inrange(price, r(p1), r(p99))
}
Can someone please help me either figure out why group_by isn't working as I would expect it to, or alternatively, help me figure out another way of achieving this task in R?