2

I want to calculate confidence intervals for a gini coefficient and I tried to apply the boot() function shown below for that purpose (as suggested here):

library(reldist) # just for the gini() function
library(boot) # for the boot() function
x <- c(1,2,2,3,4,99)
gini(x)
y <- boot(x, gini, 500)
quantile(y$t, probs=c(0.025, 0.975))

It works perfectly fine in this example. But when I try to apply it on my actual data I receive a warning ( "In sum(weights) : integer overflow - use sum(as.numeric(.))" ) and the function doesn't work. Instead of the quantile I receive an error: " Error in quantile.default(y2$t, probs = c(0.025, 0.975)) : missing values and NaN's not allowed if 'na.rm' is FALSE"

I am not entirely sure whether the integer overflow warning and this error are related but I guess so. I read several posts on integer overflow (for example this one) and downloaded the Rmpfr package (as suggested here). Without a very clear understanding of what's happening I tried to enter x as a mpfr object but it didn't work. Also, my variable is numeric so simply using as.numeric conversion (as suggested here) is also not the solution.

Is there an alternative way to calculate error bounds for gini, or a solution to this problem?

Community
  • 1
  • 1
Eva
  • 339
  • 4
  • 18
  • Maybe set `na.rm=TRUE` in the call to `quantile` if that is an appropriate step in your analysis, which I guess it would be. You may be able to investigate more into `gini` to understand why it generates the NA to begin with. – vpipkt Feb 26 '15 at 21:23
  • Setting na.rm=TRUE did not solve the problem. I just receives NAs for both quantiles. – Eva Feb 26 '15 at 21:42
  • So what is in `y` in this case? Could you post your data or data that replicates this problem? – vpipkt Feb 26 '15 at 21:47
  • Interestingly, when I produce a small subset of my data the function works perfectly fine. I did it over and over again but couldn't replicate the problem. It looks like the problem persists only if only if I choose a very large sample or try to apply the function to the entire data set, which has around 130000 cases! I am happy to post a dropbox link to share the variable if it's OK to do so. – Eva Feb 26 '15 at 22:14

0 Answers0