0

I created a histogram for a simulation, and now I need to find the total number of instances where the x-variable is greater than a given value. Specifically, my data is correlation (ranging from -1 to 1, with bin size 0.05), and I want to find the percent of the events where the correlation is greater than 0.1. Finding the total number of events greater than 0.1 is fine, because it's an easy percent to compute.

library(psych) 
library(lessR)

corrData=NULL 
for (i in 1:1000){ 
    x1 <- rnorm(mean=0, sd = 1, n=20) 
    x2 <- rnorm(mean=0, sd = 1, n=20) 
    data <- data.frame(x1,x2) 
    r <- with(data, cor(x1, x2)) 
    corrData <- append(corrData,r) 
} 

describe(corrData) 
hist <- hist(corrData, breaks=seq(-1,1,by=.05), main="N=20") 
describe(hist) count(0.1, "N=20")
Heroka
  • 12,889
  • 1
  • 28
  • 38
  • 3
    When asking for help, it's best to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample data so we can answer with working code. – MrFlick Sep 21 '15 at 14:00
  • 2
    Your histogram's break points may not match up with the cutoff you care about. For the latter, try `mean(x>0.1)` – Frank Sep 21 '15 at 14:02
  • I'm not clear why you need a histogram for this. If you have a set of correlations, `x`, and want to know the fraction where `x > 0.1`, just use this: `sum(x>0.1)/length(x)`. Are you sure you don't want: `sum(abs(x)>0.1)/length(x)`. This gives the fraction where the magnitude of the corelation is >0.1... – jlhoward Sep 21 '15 at 14:26
  • This is an assignment for class, and I'm very new to R. I made a histogram for the distribution of correlations. The next step is to determine the % of samples with a correlation >0.1, >0.3, >0.5. library(psych) library(lessR) corrData=NULL for (i in 1:1000){ x1 <- rnorm(mean=0, sd = 1, n=20) x2 <- rnorm(mean=0, sd = 1, n=20) data <- data.frame(x1,x2) r <- with(data, cor(x1, x2)) corrData <- append(corrData,r) } describe(corrData) hist <- hist(corrData, breaks=seq(-1,1,by=.05), main="N=20") describe(hist) count(0.1, "N=20") – Bridgett Sep 21 '15 at 14:44
  • I have added this to your question. For the future: edit the question yourself. Don't provide code in the comments. – Heroka Sep 21 '15 at 15:37

2 Answers2

0

TRy something like this:

N=500
bh=hist(runif(N,-1,1))
#str(bh)
sum(bh$counts[bh$mids>=.1])/N
Robert
  • 5,038
  • 1
  • 25
  • 43
0

Look at what hist is actually giving you (see ?hist):

set.seed(10230)
x<-hist(2*runif(1000)-1)
> str(x)
List of 6
 $ breaks  : num [1:11] -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 ...
 $ counts  : int [1:10] 92 99 100 105 92 116 95 102 100 99
 $ density : num [1:10] 0.46 0.495 0.5 0.525 0.46 0.58 0.475 0.51 0.5 0.495
 $ mids    : num [1:10] -0.9 -0.7 -0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9
 $ xname   : chr "2 * runif(1000) - 1"
 $ equidist: logi TRUE
 - attr(*, "class")= chr "histogram"

The breaks list item tells you the endpoints of the "catching" intervals. The counts item tells you the counts in the (one fewer) bins defined by these breaks.

So, to get as close as you can to what you want using only your hist object, you could do:

sum(x$counts[which(x$breaks>=.1)-1L])/sum(x$counts)

But, as @Frank said, this may be incorrect, particularly if the bin containing .1 does not have an endpoint at .1.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198