I need to get the interval boders from cut()
output. I found this question that suggests to use findInterval()
but it does not work as expected if value of x
is same as the upper border of cut(x)
. See here:
x <- 1:3
breaks <- c(min(x), 2, max(x))
interval <- findInterval(x, breaks)
data.frame(x,
groups= cut(x, breaks, include.lowest= TRUE),
x_lower= breaks[interval],
x_upper= breaks[interval + 1],
interval)
x groups x_lower x_upper interval
1 1 [1,2] 1 2 1
2 2 [1,2] 2 3 2
3 3 [2,3] 3 NA 3
I am happy how cut()
makes groups
from x
but x_lower
and x_upper
in row 2 and 3 are not as expected. In row two x
is 2, groups
is [1,2]
, so I expect x_lower
to be 1
and x_upper
to be 2
. And in row 3 x
is 3, groups
is [2,3]
, so I expect x_lower
to be 2
and x_upper
to be 3
. If you play around with data you will see that findinterval()
returns lower and upper values of groups
if the x
value is same as the upper border value in groups
. I want to avoid that. How can we achieve this?
Expected output
structure(list(x = 1:3, groups = structure(c(1L, 1L, 2L), .Label = c([1,2]", "(2,3]"), class = "factor"), x_lower = c(1, 1, 2), x_upper = c(2, 2, 3), interval = c(1, 1, 2)), class = "data.frame", row.names = c(NA, -3L))
Remark
I do want to use findInterval()
and I can not use labels[as.numeric(groups)]
as suggested in another post of the question above. This is because in my situation x
is sometime a numeric, sometime a Date/ POSIXct/ts/... vector, thus, using as.numeric()
is not save for me.