I'm trying to write a function in R that calculates a Gini score (coefficient of income inequality) for a given set of incomes and population shares. This is what I'm trying to do:
incomes <- c(1175,1520,1865,2210,2555) # incomes
population <- rep(1/5,5)*100 # population shares (5 times 1/5)
income <- incomes*population/sum(incomes*population) # income * frequency / total income
data <- as.data.frame(cbind(incomes,income,population/100))
names(data) <- c("incomes","income","population")
data <- data[order(as.numeric(data$incomes)),] # sort by percentage of income
for (i in 1:length(income)){
data$richer[i] <- 1-sum(data$population[1:i])
}
data$score <- data$income * (data$population + 2 * data$richer)
gini <- round(1-sum(data$score),4) # gini
This all works well. But now I want to plot the income distribution and for this I make a new dataset:
data$population2 <- data$richer + data$population # cumulative
x <- as.data.frame(matrix(data=NA,ncol=1,nrow=20))
names(x) <- c("population2")
x$population2 <- rev(seq(0.05,1,0.05))
data.graph <- join(x, data, by = "population2")
so the 'data$population2' variable will have values of 1, 0.8, 0.6, 0.4, 0.2 and x$population2 will have values of 1, 0.95, 0.9, 0.85, 0.8, etc until 0.05. However, the join function only joins the values of 1, 0.8, 0.2, not 0.6 and 0.4 as it should! Can anyone help me out?