I have some data.table
with an amount column like:
n = 1e5
set.seed(1)
dt <- data.table(id = 1:n, amount = pmax(0,rnorm(n, mean = 5e3, sd = 1e4)))
And a vector of breaks given like:
breaks <- as.vector( c(0, t(sapply(c(1, 2.5, 5, 7.5), function(x) x * 10^(1:4))) ) )
For each interval defined by these breaks, I want to use data.table
syntax to:
- get counts of
amount
contained - get counts of
amount
equal to or greater than the left bound (basicallyn * (1-cdf(amount))
For 1, this mostly works, but doesn't return rows for the empty intervals:
dt[, .N, keyby = breaks[findInterval(amount,breaks)] ] #would prefer to get 0 for empty intvl
For 2, I tried:
dt[, sum(amount >= thresh[.GRP]), keyby = breaks[findInterval(amount,breaks)] ]
but it didn't work because sum
is restricted to within the group, not beyond. So came up with a workaround, which also returns the empty intervals:
dt[, cbind(breaks, sapply(breaks, function(x) sum(amount >= x)))] # desired result
So, what's the data.table
way to fix my 2. and to get the empty intervals for both?