0

I have such a csv file with the first column being a value and the second column being the number of times this value appears. Basically, it's a probability distribution. Now I want to use R to calculate a confidence interval. Say what's the interval for 95% confidence level, how about the 90%, 85% and etc.

I searched for hours, couldn't find a proper way to do that. Sorry for my stupidity.

Thanks, J

thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 1
    Are you asking about the mathematical theory of calculating confidence intervals of a sample? If so, then StackOverflow is not the right venue. If not, then please provide what you have tried to do so far. (Reading about [reproducible questions/examples](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) might help.) – r2evans Mar 11 '15 at 04:16
  • It sounds like the only thing that is missing is an example in R coding. Since the help page for the appropriate function has one, I'm just including it in an answer. – IRTFM Mar 11 '15 at 04:21

1 Answers1

5

Sounds like you want a weighted quantile function. The Hmisc package provides one:

install.packages("Hmisc")
# the first example from the help page for ?wtd.quantile
set.seed(1)
x <- runif(500)
wts <- sample(1:6, 500, TRUE)
std.dev <- sqrt(wtd.var(x, wts))
wtd.quantile(x, wts)
#-----------    
     0%         25%         50%         75%        100% 
0.001836858 0.262917845 0.482080115 0.747400865 0.996077372 
death <- sample(0:1, 500, TRUE)
plot(wtd.loess.noiter(x, death, wts, type='evaluate'))
describe(~x, weights=wts)
#-----------
x 

 2  Variables      500  Observations
 ---------------------------------------------------------------------------
x 
      n missing  unique    Info    Mean     .05     .10     .25     .50 
   1766       0     500       1   0.502 0.07068 0.11890 0.26292 0.48208 
    .75     .90     .95 
0.74740 0.91162 0.95515 

lowest : 0.001837 0.001933 0.011150 0.013078 0.013390
highest: 0.991839 0.991906 0.992684 0.993749 0.996077 
----------------------------------------------------------------------------
(weights) 
      n missing  unique    Info    Mean 
   1766       0       6    0.95   4.364 

           1   2   3   4   5   6
Frequency 87 138 282 296 465 498
%          5   8  16  17  26  28
----------------------------------------------------------------------------
# describe uses wtd.mean, wtd.quantile, wtd.table
IRTFM
  • 258,963
  • 21
  • 364
  • 487