2

I was trying to do this: (the following dataframe is just to show the idea)

      a     b     c

[1,]  1     1     2

[2,]  1     3     5

[3,]  2     2     4

[4,]  2     1     5

in which 'a' is the factor that groups 'b' and 'c' into two categories. I want to get the weighted mean of a (b as the weight, or actually, b/Sum(b) as the weight). I couldn't find a function that operates more than one variables with the same factor.

In this example, i want to get two means:

group a=1: (1*2+3*5)/(2+5)=17/7

group a=2: (2*4+1*5)/(4+5)=13/9

I'm new to R so this is really hard for me to handle. Hope you guys could spare a few seconds to comment. Thanks very much!

divibisan
  • 11,659
  • 11
  • 40
  • 58
  • 1
    Pick your favorite answer from the [Mean by Group R-FAQ](https://stackoverflow.com/q/11562656/903061), and use `weighted.mean` instead of `mean`. – Gregor Thomas May 09 '18 at 16:18
  • Thank you! Just saw the comment, I will try to understand those codes! ( I think it’s because I searched for sth. else instead of “mean” in the beginning that I didn’t see the page you pasted. – JustCallMeGary May 09 '18 at 17:43
  • The term *"operate two variables with one factor"* doesn't make sense to me. I think you meant *"group_by/split on levels of a factor"*. Tagged [tag:group-by] – smci Aug 18 '18 at 01:59

1 Answers1

2

We can convert the matrix (based on the structure showed) to data.frame, grouped by 'a', summarise by taking the sum of the product of 'b', 'c', divided by the sum of 'c'

library(dplyr)
m1 %>%
   as.data.frame %>% # if it is a matrix
   group_by(a) %>%
   summarise(new = sum(b*c)/sum(c))
# A tibble: 2 x 2
#       a   new
#   <int> <dbl>
#1     1  2.43
#2     2  1.44

data

m1 <- structure(c(1L, 1L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 5L, 4L, 5L), .Dim = c(4L, 
3L), .Dimnames = list(NULL, c("a", "b", "c")))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for your patience! I ran the code and it does give a similar output (A tibble: 109 x 2 in my case). However the 'new' column are all same values. I don't know where i did wrong...if 'local' is already a dataframe, can i do this:`local%>%group_by(local$bond)%>%summarize(new=crossprod(local$gross,local$latestytm)/sum(local$gross))`? now it works ! Thank you so much! I was too careless. This really helps. – JustCallMeGary May 09 '18 at 16:13
  • @JustCallMeGary Perhaps you loaded `plyr` too that masks the `summarise`. You could explicitly use `%>% dplyr::summarise(new = sum(..`. In the `group_by`, it should be `group_by(bond)` and not `group_by(local$bond)` (if you check my syntax) i.e. the column name and not the column values – akrun May 09 '18 at 16:14
  • 1
    `weighted.mean(x = b, w = c)` – Gregor Thomas May 09 '18 at 16:17