Subset a dataframe, calculate the mean and populate a dataframe in a loop in R

Question

I have a set of 85 possible combinations from two variables, one with five values (years) and one with 17 values (locations). I make a dataframe that has the years in the first column and the locations in the second column. For each combination of year and location I want to calculate the weighted mean value and then add it to the third column, according to the year and location values.

My code is as follows:

for (i in unique(data1$year)) {
  for (j in unique(data1$location)) {
   data2 <- crossing(data1$year, data1$location)
   dataname <- subset(data1, year %in% i & location %in% j)
   result <- weighted.mean(dataname$length, dataname$raising_factor, na.rm = T)

  } 
}

The result I gets puts the last calculated mean in the third column for each row.

How can I get it to add according to matching year and location combination?

thanks.

Please make your [question reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Add expected output and add a subset of your data1 data.frame. — phiver, Aug 06 '18 at 10:49

score 2 · Answer 1 · answered Aug 06 '18 at 12:53

2

A base R option would be by

by(df[c('x', 'y')], df[c('group', 'year')],
          function(x) weighted.mean(x[,1], x[,2]))

Based on @LAP's example

answered Aug 06 '18 at 12:53

akrun

874,273
37
540
662

score 1 · Answer 2 · answered Aug 06 '18 at 11:01

As @A.Suleiman suggested, we can use dplyr::group_by.

Example data:

df <- data.frame(group = rep(letters[1:5], each = 4),
                 year = rep(2001:2002, 10),
                 x = 1:20,
                 y = rep(c(0.3, 1, 1/0.3, 0.4), each = 5))

library(dplyr)

df %>%
  group_by(group, year) %>%
  summarise(test = weighted.mean(x, y))

# A tibble: 10 x 3
# Groups:   group [?]
    group  year      test
   <fctr> <int>     <dbl>
 1      a  2001  2.000000
 2      a  2002  3.000000
 3      b  2001  6.538462
 4      b  2002  7.000000
 5      c  2001 10.538462
 6      c  2002 11.538462
 7      d  2001 14.000000
 8      d  2002 14.214286
 9      e  2001 18.000000
10      e  2002 19.000000

Subset a dataframe, calculate the mean and populate a dataframe in a loop in R

2 Answers2