2

I am using package Ineq in R to calculate Gini coefficent. From inspecting the source code (below), it is ordering vector x first before computing Gini.

Example data:

example_data = data.frame(SCORE_RANGE = c('100-200','201-300','301- 
400','401-500','501-600'),
NUMBER_OF_OBSERVATIONS = c(100,100,100,100,100),
NUMBER_OF_NON_EVENT = c(85,90,95,90,90),
NUMBER_OF_EVENT = c(15,10,5,10,10))

Source code of Gini function from ineq package:

Gini = function (x, corr = FALSE, na.rm = TRUE) 
{
if (!na.rm && any(is.na(x))) 
    return(NA_real_)
x <- as.numeric(na.omit(x))
n <- length(x)
x <- sort(x)
G <- sum(x * 1L:n)
G <- 2 * G/sum(x) - (n + 1L)
if (corr) 
    G/(n - 1L)
else G/n
}

I am doing this for my credit score models and I have binned data into score ranges of equal frequencies and then order by scores (smallest to largest).

Using Gini function from ineq package would give 0.16. Is this correct given this context and that Gini function from ineq package reorder the vector before computing? If not, what is the correct Gini coefficient should be?

Gini(example_data$NUMBER_OF_EVENT) 
Khiem Nguyen
  • 129
  • 1
  • 11
  • 1
    It is not clear how you bin data. If sorting of ranges gives right order of ranges, i.e. "the smallest" on top of list etc, then function can be applied – Nar Aug 26 '18 at 13:23
  • @Nar: I have binned it into groups of equal frequencies. E.g score range (100-200) and score range (201-300) would both have 100 observations in each bin. My point is, I have already sort by score (smallest to largest), and the function reorder my data again to calculate Gini coefficient. I'm unsure about the part, the function reordering my data again. – Khiem Nguyen Aug 26 '18 at 14:53
  • You have to add average for each bin to calculate Gini index. I take here assumption that it is exactly middle between bin limits, though based on you data it could be other number, and then apply directly Gini function. Looks like it will be the easiest way: example_data$x <- c(150, 250, 350, 350, 550) Gini(example_data$x) – Nar Aug 27 '18 at 09:31
  • I don't think that is the case, since score ranges is just model output. The model is a classifier classifying between events and non-events. – Khiem Nguyen Aug 27 '18 at 14:20

0 Answers0