0

I want to get mean of V1 weighted by V2 and split by V3 and V4, so I tried aggregate, it doesn't work because weighted.mean would only accept one parameter from aggregate so it's the same as mean:

df <- data.frame(V1=seq(1, 1.5, 0.1),
                 V2=seq(6, 1, -1),
                 V3=c('A','A','A','B','B','B'),
                 V4=c('F','P','P','P','F','F'))
aggregate(cbind(V1, V2) ~ V3 + V4, data=df, weighted.mean)

  V3 V4   V1  V2
1  A  F 1.00 6.0
2  B  F 1.45 1.5
3  A  P 1.15 4.5
4  B  P 1.30 3.0

So I have to go back to split,lapply and deal with many other stuffs

df1 <- expand.grid(unique(df$V3), unique(df$V4))
colnames(df1) <- c('V3','V4')
df1$CrossTerm <- interaction(df1$V3, df1$V4)

temp <- sapply(split(df, interaction(df$V3, df$V4)), function(df) weighted.mean(df$V1, df$V2))
df2 <- data.frame(WeightedAvg=temp)
df2$CrossTerm <- rownames(df2)
df3 <- merge(df1, df2, by="CrossTerm")
df3$CrossTerm <- NULL
df3

  V3 V4 WeightedAvg
1  A  F    1.000000
2  A  P    1.144444
3  B  F    1.433333
4  B  P    1.300000

But it's a ugly and possibly buggy solution, is there any better solution to get a similar result like this? The format is important and should be as similar to my output as possible(sequence of rows can be different).

user3684014
  • 1,175
  • 12
  • 26

1 Answers1

2

You can do this pretty easily with data.table aggregation:

library(data.table)
setDT(df)
##
R> df[
    ,list(
      wtdMean=weighted.mean(V1,V2)
    ),
    keyby="V3,V4"]
   V3 V4 wtdNMean
1:  A  F 1.000000
2:  A  P 1.144444
3:  B  F 1.433333
4:  B  P 1.300000
nrussell
  • 18,382
  • 4
  • 47
  • 60