As far as I know, there is no cutting/breaking function in base R that allows you to specify such irregular breaks like that. You could wrap findInterval
to do some of the manupulations
findInterval2 <- function(x, br, rightmost.closed = FALSE, left.closed=TRUE,
trim=FALSE, labels=NULL) {
r <- findInterval(x, br, rightmost.closed)
closed.left <- c(rep_len(left.closed, length(br)), rightmost.closed)
m <- x %in% br
slideright <- m & r==0 & !left.closed[1]
r[slideright] <- r[slideright] + 1
slideleft <- which(m & r!=0 & !left.closed[ifelse(r==0,NA,r)])
r[slideleft] <- r[slideleft]-1
rng <- 0:length(br)
if(trim) {
r[r<1 | r>length(br)-1] <- NA
rng <- 1:(length(br)-1)
}
if (is.null(labels) || (is.logical(labels) && labels==TRUE)) {
ff <- format(embed(br,2))
labels <- paste0(
ifelse(left.closed, "[","("),
ff[,2], ", ", ff[,1],
ifelse(c(left.closed[-1], rightmost.closed), ")","]")
)
if(!trim) {
labels <- c(
paste0("(-Inf,", ff[1,2], ifelse(left.closed[1], ")","]") ),
labels,
paste0( ifelse(rightmost.closed, "[","("), ff[nrow(ff),1], ", Inf)" )
)
}
} else if (is.logical(labels) && labels==FALSE) {
labels = NULL
}
if (!is.null(labels)) {
r <- factor(r, levels=rng, labels=labels)
}
r
}
With a list of breaks br<-c(4.25 ,4.75, 4.90,5.10, 5.25, 5.75)
, the normal behavior of findInterval
creates breaks/labels with
-inf < x < 4.25
: 0
4.25 <= x < 4.75
: 1
4.75 <= x < 4.90
: 2
4.90 <= x < 5.10
: 3
5.10 <= x < 5.25
: 4
5.25 <= x < 5.75
: 5
x>=5.75
: 6
However, if we add our new parameter left.closed
, we an specify if each of the regions specified by the pairs of break values should be left closed (the default) or right closed. This vector should have a length one less than the length of the break vector.
We could get the breaks you desire with
rr <- findInterval2(x, br, rightmost.closed=FALSE,
left.closed=c(T, T, T, F, T), trim=TRUE)
which should create
4.25 <= E < 4.75
: 1
4.75 <= E < 4.90
: 2
4.90 <= E <= 5.10
: 3
5.10 < E < 5.25
: 4
5.25 <= E <= 5.75
: 5
Note that testing for exact matches with numeric (decimal) values is very messy. So doing this stuff with continuous data is potentially flawed.
Also note that this doesn't necessarily apply directly to histograms. This function can be used for binning and then creating a barplot
if you would like to visualize the data. Histograms are really only for estimating the underlying density of continuous random variables and if are are being this picky about break points, it seems like your data may be more discrete and you are interested in counts rather than densities.
For example, we can create test data with
set.seed(15)
br <- c(4.25,4.75, 4.90, 5.10, 5.25, 5.75)
x <- runif(45, min(br), max(br))
rr <- findInterval2(x, br, rightmost.closed=FALSE,
left.closed=c(T, T, T, F, T), trim=TRUE)
barplot(table(rr))
