Weighted mean by group in long-data

Question

My questing is kinda similar to this question, and is building on this answer, only thing is that my data is long format, not wide, and I would like to keep it that way.

Wondered if there's smart way to calculate the weighted.mean() shown in this answer, but with long data.

Say my data lookslike this

library(tidyverse)

dft_w <- structure(list(obs = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L), education = c("A", 
"A", "B", "B", "B", "B", "A", "A"), Item = c("income", "weight", 
"income", "weight", "income", "weight", "income", "weight"), 
    Amount = c(1000L, 10L, 2000L, 1L, 1500L, 5L, 2000L, 2L)), row.names = c(NA, 
-8L), class = c("tbl_df", "tbl", "data.frame")); dft_w
# A tibble: 8 x 4
    obs education Item   Amount
  <int> <chr>     <chr>   <int>
1     1 A         income   1000
2     1 A         weight     10
3     2 B         income   2000
4     2 B         weight      1
5     3 B         income   1500
6     3 B         weight      5
7     4 A         income   2000
8     4 A         weight      2

and I would like to get to something like this

# A tibble: 12 x 4
     obs education Item            Amount
   <int> <chr>     <chr>            <dbl>
 1     1 A         income           1000 
 2     1 A         weight             10 
 3     1 A         weighted_income  1167.
 4     2 B         income           2000 
 5     2 B         weight              1 
 6     2 B         weighted_income  1583.
 7     3 B         income           1500 
 8     3 B         weight              5 
 9     3 B         weighted_income  1583.
10     4 A         income           2000 
11     4 A         weight              2 
12     4 A         weighted_income  1167.

r2evans · Accepted Answer · 2021-09-01T12:15:39.430

dft_w %>%
  group_by(education) %>%
  summarize(
    Amount = rep(weighted.mean(Amount[Item == "income"], Amount[Item == "weight"]), length(unique(obs))), 
    obs = unique(obs), 
    Item = "weighted_income"
  ) %>%
  bind_rows(dft_w, .) %>%
  arrange(obs, education, Item)
# # A tibble: 12 x 4
#      obs education Item            Amount
#    <int> <chr>     <chr>            <dbl>
#  1     1 A         income           1000 
#  2     1 A         weight             10 
#  3     1 A         weighted_income  1167.
#  4     2 B         income           2000 
#  5     2 B         weight              1 
#  6     2 B         weighted_income  1583.
#  7     3 B         income           1500 
#  8     3 B         weight              5 
#  9     3 B         weighted_income  1583.
# 10     4 A         income           2000 
# 11     4 A         weight              2 
# 12     4 A         weighted_income  1167.

Note that this will error if the data does not contain equal numbers of "income" and "weight" (erring with 'x' and 'w' must have the same length).

This can be preempted with sufficient filtering, perhaps this:

dft_w %>%
  slice(-1) %>%                         # just to trigger the fail, test the filter
  group_by(obs, education) %>%
  filter(all(c("income", "weight") %in% Item)) %>%
  group_by(education) %>%
  summarize(
    Amount = rep(weighted.mean(Amount[Item == "income"], Amount[Item == "weight"]), length(unique(obs))), 
    obs = unique(obs), 
    Item = "weighted_income"
  ) %>%
  bind_rows(slice(dft_w, -1), .) %>%    # slice() only to keep the output consistent
  arrange(obs, education, Item)
# # A tibble: 10 x 4
#      obs education Item            Amount
#    <int> <chr>     <chr>            <dbl>
#  1     1 A         weight             10 
#  2     2 B         income           2000 
#  3     2 B         weight              1 
#  4     2 B         weighted_income  1583.
#  5     3 B         income           1500 
#  6     3 B         weight              5 
#  7     3 B         weighted_income  1583.
#  8     4 A         income           2000 
#  9     4 A         weight              2 
# 10     4 A         weighted_income  2000

noting that the obs/education pair without both will not gain the "weighted_income" value.

score 1 · Answer 2 · answered Sep 01 '21 at 12:38

Another way around is to use tidyr's pivot_wider and pivot_longer in the same pipe chain so you can actually work with wide data before going back to long format. It may not be the most efficient way but it allows to keep "wide-format" tips & tricks.

library(dplyr)
dft_w %>% 
  tidyr::pivot_wider(names_from = Item, values_from = Amount) %>% 
  group_by(education) %>% 
  mutate(weighted_income = weighted.mean(income, weight)) %>% 
  tidyr::pivot_longer(3:last_col(), names_to = "Item", values_to = "Amount")

Output:

# A tibble: 12 x 4
# Groups:   education [2]
     obs education Item            Amount
   <int> <chr>     <chr>            <dbl>
 1     1 A         income           1000 
 2     1 A         weight             10 
 3     1 A         weighted_income  1167.
 4     2 B         income           2000 
 5     2 B         weight              1 
 6     2 B         weighted_income  1583.
 7     3 B         income           1500 
 8     3 B         weight              5 
 9     3 B         weighted_income  1583.
10     4 A         income           2000 
11     4 A         weight              2 
12     4 A         weighted_income  1167.

score 0 · Answer 3 · answered Sep 01 '21 at 12:26

Here is just another way of doing this using tibble::add_row. I just opted for only one summary per grouping variable:

library(dplyr)
library(purrr)

dft_w %>%
  group_split(education) %>%
  map_dfr(~ .x %>% 
            add_row(obs = .x$obs[1], education = .x$education[1],
                    Item = "weighted.mean", Amount = weighted.mean(.x$Amount[.x$Item == "income"], 
                                                                   .x$Amount[.x$Item == "weight"])))

# A tibble: 10 x 4
     obs education Item          Amount
   <int> <chr>     <chr>          <dbl>
 1     1 A         income         1000 
 2     1 A         weight           10 
 3     4 A         income         2000 
 4     4 A         weight            2 
 5     1 A         weighted.mean  1167.
 6     2 B         income         2000 
 7     2 B         weight            1 
 8     3 B         income         1500 
 9     3 B         weight            5 
10     2 B         weighted.mean  1583.

Weighted mean by group in long-data

3 Answers3