I am trying to calculate gini coefficient with sample weights for different groups in my data. I prefer to use aggregate
because I later use the output from aggregate
to plot the coefficients. I found alternative ways to do it but in those cases the output wasn't exactly what I needed.
library(reldist) #to get gini function
dat <- data.frame(country=rep(LETTERS, each=10)[1:50], replicate(3, sample(11, 10)), year=sample(c(1990:1994), 50, TRUE),wght=sample(c(1:5), 50, TRUE))
dat[51,] <- c(NA,11,2,6,1992,3) #add one more row with NA for country
gini(dat$X1) #usual gini for all
gini(dat$X1,weight=dat$wght) #gini with weight, that's what I actually need
print(a1<-aggregate( X1 ~ country+year, data=dat, FUN=gini))
#Works perfectly fine without weight.
But, now how can I specify the weight option within aggregate? I know there are other ways (as shown here) :
print(b1<-by(dat,list(dat$country,dat$year), function(x)with(x,gini(x$X1,x$wght)))[])
#By function works with weight but now the output has NAs in it
print(s1<-sapply(split(dat, dat$country), function(x) gini(x$X1, x$wght)))
#This seems to a good alternative but I couldn't find a way to split it by two variables
library(plyr)
print(p1<-ddply(dat,.(country,year),summarise, value=gini(X1,wght)))
#yet another alternative but now the output includes NAs for the missing country
If someone could show me way to use weighted gini
function within aggregate
that would be very helpful, as it produces the output exactly in the way I need. Otherwise, I guess I will work with one of the alternatives.