How to calculate the percentage of data points belonging to a range of values?

Question

Given a table of values (say between 0 to 100) and the attached plot, what would be the simplest way using R to calculate how many of the data points fall between values 20 - 60 (the red box in the image)?

And is there a way to create that red box using R's plotting functions (I did it using a image editor...)?

Thanks for the help. enter image description here

Josh O'Brien · Accepted Answer · 2012-10-19T18:04:29.260

13

To calculate the probability mass contained within the interval:

x <- rnorm(1e6)  ## data forming your empirical distribution
ll <- -1.96      ## lower bound of interval of interest
ul <- 1.96       ## upper bound of interval of interest

sum(x > ll & x < ul)/length(x)
# [1] 0.949735

And then to plot the histogram and the red box:

h <- hist(x, breaks=100, plot=FALSE)       # Calculate but don't plot histogram
maxct <- max(h$counts)                     # Extract height of the tallest bar
## Or, if you want the height of the tallest bar within the interval
# start <- findInterval(ll, h$breaks)
# end   <- findInterval(ul, h$breaks)
# maxct <- max(h$counts[start:end])

plot(h, ylim=c(0, 1.05*maxct), col="blue") # Plot, leaving a bit of space up top

rect(xleft = ll, ybottom = -0.02*maxct,    # Add box extending a bit above
     xright = ul, ytop = 1.02*maxct,       # and a bit below the bars
     border = "red", lwd = 2)

enter image description here

edited Oct 19 '12 at 18:04

answered Oct 19 '12 at 16:53

Josh O'Brien

159,210
26
366
455

@mrdwab -- Thanks. I had missed that, thinking that it was just a very well illustrated question, I guess! – Josh O'Brien Oct 19 '12 at 17:27
You could calculate maxct for the interval and not the whole histogram. – Roland Oct 19 '12 at 17:28
2

@Roland -- OK. Just added a fix for that as well. Done with this now. – Josh O'Brien Oct 19 '12 at 17:41

Roland · Answer 2 · 2012-10-19T17:16:05.270

8

set.seed(42) 
x <- rlnorm(5000) #some data
hist(x) #histogram
rect(7,-50,10,100,border="red") #red rectangle
table(cut(x,breaks=c(0,7,10,Inf)))/length(x) #fraction of values in intervals
#(0,7]    (7,10]   (10,Inf] 
#0.9754   0.0136   0.0110

Cut categorizes the values according to which interval they belong in. table then creates a table of counts, which then can be divided by the total count length(x).

edited Oct 19 '12 at 17:16

answered Oct 19 '12 at 16:58

Roland

127,288
10
191
288

Rather than only post a block of code, please *explain* why this code solves the problem posed. Without an explanation, this is not an answer. – Martijn Pieters Oct 19 '12 at 17:04
1

I disagree. I commented the code and everything else can be found using R's excellent help system. – Roland Oct 19 '12 at 17:05
I think something in between might have been useful here, which is the point Martijn makes. You stand greater chance of getting +1's that way. – Gavin Simpson Oct 19 '12 at 17:07
This is great, only problem I am having is that I needed to switch length(x) with nrow(x). Not sure if it is related but for some reason the values returned by the cut/length do not sum to 1. – user971956 Oct 19 '12 at 17:20
@user971956 Well, if you contributed any [reproducible code](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) we could tailor the answers to your specific case. – Roland Oct 19 '12 at 17:22
Roland, I am using the code you have supplied, the only difference is that my data is loaded from a large CSV file into a table, and all values of interest are in column 4... – user971956 Oct 19 '12 at 17:30
1

@user971956 `x <- df[,4]`, where df is your data.frame. – Roland Oct 19 '12 at 17:32
1

Found why it doesn't sum to 1, some of the values are equal to 0, so the first parameter in the breaks needs to be equal to -Inf. Thank you very much for your help. – user971956 Oct 19 '12 at 18:08

How to calculate the percentage of data points belonging to a range of values?

2 Answers2