How to use weighted gini function within aggregate function?

Question

I am trying to calculate gini coefficient with sample weights for different groups in my data. I prefer to use aggregate because I later use the output from aggregate to plot the coefficients. I found alternative ways to do it but in those cases the output wasn't exactly what I needed.

library(reldist) #to get gini function
dat <- data.frame(country=rep(LETTERS, each=10)[1:50], replicate(3, sample(11, 10)), year=sample(c(1990:1994), 50, TRUE),wght=sample(c(1:5), 50, TRUE))
dat[51,] <- c(NA,11,2,6,1992,3) #add one more row with NA for country

gini(dat$X1) #usual gini for all
gini(dat$X1,weight=dat$wght) #gini with weight, that's what I actually need
print(a1<-aggregate( X1 ~ country+year, data=dat, FUN=gini)) 
#Works perfectly fine without weight.

But, now how can I specify the weight option within aggregate? I know there are other ways (as shown here) :

print(b1<-by(dat,list(dat$country,dat$year), function(x)with(x,gini(x$X1,x$wght)))[]) 
#By function works with weight but now the output has NAs in it

print(s1<-sapply(split(dat, dat$country), function(x) gini(x$X1, x$wght))) 
#This seems to a good alternative but I couldn't find a way to split it by two variables

library(plyr)
print(p1<-ddply(dat,.(country,year),summarise, value=gini(X1,wght))) 
#yet another alternative but now the output includes NAs for the missing country

If someone could show me way to use weighted gini function within aggregate that would be very helpful, as it produces the output exactly in the way I need. Otherwise, I guess I will work with one of the alternatives.

@Metrics I meant to add an extra row to have one more case with NA for country variable. I must have deleted the line in between. I corrected it now. Thanks! — Eva, Feb 28 '15 at 16:48

Metrics · Answer 1 · 2015-02-28T17:43:43.777

3

 #using aggregate
    aggregate( X1 ~ country+year, data=dat, FUN=gini,weights=dat$wght) # gives different answer than the data.table and dplyr (not sure why?)
 #using data.table
    library(data.table)
    DT<-data.table(dat)
    DT[,list(mygini=gini(X1,wght)),by=.(country,year)]

 #Using dplyr
    library(dplyr)
    dat %>%
    group_by(country,year)%>%
    summarise(mygini=gini(X1,wght))

edited Feb 28 '15 at 17:43

answered Feb 28 '15 at 16:47

Metrics

15,172
7
54
83

I couldn't run your code using dplyr. The one with data.table works fine, but the output includes the observation where there is an NA for country variable. – Eva Feb 28 '15 at 16:55
Yes, it includes NA for the countries with NA. You have to remove these NAs if you don't need. You have to do `library(dplyr)` for`dplyr`. – Metrics Feb 28 '15 at 17:00
Thanks but that is exactly why I asked for a solution with aggregate (if possible!). Otherwise ddply also gives the same output anyways. And yes, my mistake! I thought the necessary package was plyr. – Eva Feb 28 '15 at 17:07
Just saw the addition. That's exactly what I needed, thanks a lot! Such a simple thing! I should have thought that... – Eva Feb 28 '15 at 17:10
No problem .I am amazed why it is giving an error for `wght` in weights but accepting `dat$wght` – Metrics Feb 28 '15 at 17:13
Oh, I am very sorry! I was too quick to accept it. There is something wrong with this. I don't know what's happening but the results change so drastically that they can't be true. For example: with no sample weights the gini varies between .40 to .70 across countries. With your aggregate suggestion the values are around 0.007...If you run data.table and aggregate options on the same small data you will also see that aggregate output is drastically different. Incidentally, how do you write commands in that format on a comment window? – Eva Feb 28 '15 at 17:20
yeah, you are correct. The values are substantially different for `aggregate`. You can see that `dplyr` and `data.table` give the same solution and are not substantially different from unweighted gini. I don't understand what do you mean by comment window? I just use `#` and then start typing. – Metrics Feb 28 '15 at 17:29
1

OK Thanks! I removed `aggregate` comment from the answer as it was giving a misleading solution. And never mind the other question, it was a formatting thing. I was trying to find the a way to write aggregate as `aggregate`. I found it! – Eva Feb 28 '15 at 17:37

How to use weighted gini function within aggregate function?

1 Answers1